Is it possible for a set of random variables to each be highly correlated with another variable, but not highly correlated with each other?

Let $X_1, ..., X_n$ and $Y$ be random variables. Is it possible for the $X_i$'s to all have a high magnitude of correlation against $Y$ (absolute value of Pearson's $r$), but not be strongly correlated with each other?


Solution 1:

It's mentioned here (reference included) that $d(X,Y)= \sqrt{1 - r_{X,Y}^2}$ , where $r$ is the correlation factor, is a proper distance.

Then, we can state

$$d(X_i,Y) \le \epsilon \implies d(X_i, X_j) \le 2 \epsilon$$

If all the correlations $({X_i,Y})$ are high, so that $|r_{X_i,Y}| \ge \rho \ge \sqrt{3/4} = 0.86660$ then we have the bound

$$ |r_{X_i,X_j}| \ge 2 \sqrt{\rho^2 -\frac34}$$

enter image description here

Solution 2:

$\newcommand{\cor}{\operatorname{cor}}$ $$ \cor(A,C) \ge \cos\big( \arccos(\cor(A,B)) + \arccos(\cor(B,C)) \big) $$

For example, if $\cor(A,B)=0.98$ and $\cor(B,C)=0.96$ then the two arccosines are respectively about $ 11.47834^\circ $ and $16.26026^\circ,$ and their sum is about $27.73855^\circ,$ whose cosine is about $0.8850807,$ so $\cor(A,C)$ cannot be smaller than that.

If $\cor(A,B)<0$ then $\cor(-A,B)>0$ and all of the above applies.

Daniel T. Kaplan's book Statistical Modeling: A Fresh Approach is quite explicit about this.