notation (ab)use for random variables, distributions, pdfs/pmfs

This question is about notation for random variables (RVs), distributions and pdfs/pmfs and their common (ab)use as I recently got confused.

Let $X,Y$ denote random variables.

First, notations I usually encounter. Please correct me:

  • values a RV takes on are usually denoted by small caps so that $P(X=x) \in [0,1]$ denotes the probability of the RV $X$ taking on the value $x$
  • $X_1,...,X_n \sim X$ means "let X_1,...,X_n be RV with same distribution as $X$" (often $\overset{\text{iid}}{\sim}$)
  • if $X$ is discrete it's pmf is usually denoted by $p(x) = p_X(x) = P(X=x) \in [0,1]$
  • if $X$ is non-discrete it's pdf is usually denoted by $f(x) = f_X(x) \in [0,\infty)$ or $p(x) = p_X(x)$ to easily talk about discrete and non-discrete RVs at the same time
  • the cdf is usually written as $F(x) = F_X(x) = P(X \leq x)$ which is a sum/integral using the pdf/pmf

The following notations I've usually understood in an "intuitive" way or assumed to just be sloppy but caused some confusion:

  • "Let $X$ be a RV with distribution $X \sim P(X)$" -- What exactly is meant? Should I think of $P$-robability here or is it a symbol which reads "this denotes/represents the distribution of $X$"?
  • "$p(X,Y), p(X), p(X|Y)$ denote the joint, marginal, conditional probability density functions" -- How should I understand this? I mean, they should be functions of values the RVs can take on but here they take the RVs itself as argument?
  • " Let $P(x,y)$ be an (unknown) joint probability distribution on instances and labels $X × Y$. Given a training sample ${(x_i, y_i)}_{i=1}^n \overset{\text{iid}}{\sim} P(x,y)$ ..." -- How to read this?

Could someone help me out and shed some light upon above mentioned points?

Sorry, if my questions are stupid. I just feel the notation gets far more sloppy when reading applied stuff and it would help me to pin down what actually is meant or to know that one needs to relax and learn how to sloppily-correctly read this.


Solution 1:

You are quite right. The second bullet list is full of muddled notation, which shows that the author has vague and confused ideas about probability. Rather than trying to interpret and learn from this sort of stuff, you would be much better off sticking to material by authors whose notation makes sense.