Published on June 2, 2016; tags: Mathematics, Probability

I have two correlated random variables, \(X\) and \(Y\), with zero mean and equal variance. I tell you that the best way to predict \(Y\) based on the knowledge of \(X\) is \(y = a x\). Now, you tell me, what is the best way to predict \(X\) based on \(Y\)?

Your intuition might tell you that if \(y = ax\), then \(x = y/a\). This is correct most of the time… but not here. The right answer will surprise you.

So *what is* the best way to predict \(Y\) based on \(X\) and vice versa? Let’s find the \(a\) that minimizes the mean squared error \(E[(Y-aX)^2]\):

\[E[(Y-aX)^2] = E[Y^2-2aXY+a^2X^2]=(1+a^2)\mathrm{Var}(X)-2a\mathrm{Cov}(X,Y);\]

\[\frac{\partial}{\partial a}E[(Y-aX)^2] = 2a\mathrm{Var}(X)-2\mathrm{Cov}(X,Y);\]

\[a=\frac{\mathrm{Cov}(X,Y)}{\mathrm{Var}(X)}=\mathrm{Corr}(X,Y).\]

Notice that the answer, the (Pearson) correlation coefficient, is symmetric w.r.t. \(X\) and \(Y\). Thus it will be the same whether we want to predict \(Y\) based on \(X\) or \(X\) based on \(Y\)!

How to make sense of this? It may help to consider a couple of special cases first.

First, suppose that \(X\) and \(Y\) are perfectly correlated and you’re trying to predict \(Y\) based on \(X\). Since \(X\) is such a good predictor, just use its value as it is (\(a=1\)).

Now, suppose that \(X\) and \(Y\) are uncorrelated. Knowing the value of \(X\) doesn’t tell you anything about the value of \(Y\) (as far as linear relationships go). The best predictor you have for \(Y\) is its mean, \(0\).

Finally, suppose that \(X\) and \(Y\) are *somewhat* correlated. The correlation coefficient is the degree to which we should trust the value of \(X\) when predicting \(Y\) versus sticking to \(0\) as a conservative estimate.

This is the key idea—to think about \(a\) in \(y=ax\) not as a degree of proportionality, but as a degree of “trust”.