Surprising reciprocity

Published on ; updated on

\(X\) and \(Y\) are two correlated random variables with zero mean and equal variance. If the best way to predict \(Y\) based on \(X\) is \(y = a x\), what is the best way to predict \(X\) based on \(Y\)?

It is not \(x = y / a\)!

Let’s find the \(a\) that minimizes the mean squared error \(E[(Y-aX)^2]\):

\[E[(Y-aX)^2] = E[Y^2-2aXY+a^2X^2]=(1+a^2)\mathrm{Var}(X)-2a\mathrm{Cov}(X,Y);\]

\[\frac{\partial}{\partial a}E[(Y-aX)^2] = 2a\mathrm{Var}(X)-2\mathrm{Cov}(X,Y);\]

\[a=\frac{\mathrm{Cov}(X,Y)}{\mathrm{Var}(X)}=\mathrm{Corr}(X,Y).\]

Notice that the answer, the (Pearson) correlation coefficient, is symmetric w.r.t. \(X\) and \(Y\). Thus it will be the same whether we want to predict \(Y\) based on \(X\) or \(X\) based on \(Y\)!

How to make sense of this? It may help to consider a couple of special cases.

First, suppose that \(X\) and \(Y\) are perfectly correlated and you’re trying to predict \(Y\) based on \(X\). Since \(X\) is such a good predictor, just use its value as it is (\(a=1\)).

Now, suppose that \(X\) and \(Y\) are uncorrelated. Knowing the value of \(X\) doesn’t tell you anything about the value of \(Y\) (as far as linear relationships go). The best predictor you have for \(Y\) is its mean, \(0\).

Finally, suppose that \(X\) and \(Y\) are somewhat correlated. The correlation coefficient is the degree to which we should trust the value of \(X\) when predicting \(Y\) versus sticking to \(0\) as a conservative estimate.

This is the key idea — to think about \(a\) in \(y=ax\) not as a degree of proportionality, but as a degree of “trust”.


Added on 2020-08-21:

There was a related interesting discussion on twitter: