Where did all my probability go?

Published on

Let X be a one-dimensional normal variable with mean 0 and standard deviation σ. What is the limit of P(X>a|σ) when σ tends to infinity?

We can compute the probability P(X>a|σ) as an integral of the normal density function:

P(X>a|σ)=a+12πσex2/2σ2dx.

As σ goes to infinity, 12πσ goes to 0, and ex2/2σ2 goes to 1. Their product, therefore, goes to 0, and so

limσ+P(X>a|σ)=a+limσ+12πσex2/2σ2dx=0.

But this does not make sense, does it? As the standard deviation increases, the normal distribution becomes more spread out, and the probability that X will exceed a fixed threshold a should increase, not fall down to 0.

Indeed, the same argument could be used to “show” that the probability P(Xa|σ) also tends to 0 as σ increases. So where does all the probability go?

The problem in our reasoning is swapping the integral and the limit operations, i.e. equating

limσ+a+12πσex2/2σ2dx

to

a+limσ+12πσex2/2σ2dx.

But why exactly this is a problem requries a little digging.

Proper integrals

We might suspect that the reason swapping the integral and the limit didn’t work is that we are integrating over an infinite interval [a;+); i.e. that our integral is improper.

One reason to think so is that we would be right to conclude that limσ+P(aXb|σ)=0, where the integration happens over a finite interval [a,b].

The improperness of the integral does play a role, as we shall see below; but by itself, properness is neither necessary nor sufficient to swap the integral and the limit.

A. Ya. Dorogovtsev in his mathematical analysis textbook gives the following example (section 13.2.3):

f(x,y)=1y(1x1/y)x1/y,

where x[1/2,1] and y(0,1].

For any given y, f(,y):[1/2,1]R is a continuous function on a closed interval, so 1/21f(x,y)dx is proper. And yet, it can be shown that

x[1/2,1]limy0+f(x,y)=0 while limy0+1/21f(x,y)dx=1/2.

Let’s look at the graph of f(x,y) for different values of y to see what’s going on.

Graph of f(,y) for decreasing values of y.

On the one hand, at any given x, f(x,y) tends to 0. (Look at how the function value changes for a fixed x, say, x=0.8 or x=0.9. It may be less obvious that the same is true for x=0.99; you’ll have to trust that the trend continues or verify it analytically.)

On the other hand, the function as a whole does not seem to converge to 0. This is formalized by the notion of uniform convergence.

We say that g(x,y) converges uniformly to g(x) when yy0 iff the maximum vertical distance between the graphs of g(x,y) and g(x), i.e. supx|g(x,y)g(x)|, goes to 0.

Uniform convergence on [a,b] is sufficient (although not necessary) to replace limyy0abg(x,y)dx with ablimyy0g(x,y)dx.

Clearly, our f(x,y) converges to 0 non-uniformly: not only does supx|f(x,y)| not tend to 0, it tends to infinity. Non-uniform convergence allows a travelling wave. Such a wave can contribute to the integral while escaping the point-wise convergence analysis (the function converges at every point, but only after the wave has passed).

Improper integrals

Let’s revisit our first example, the limit limσ+P(X>a|σ)dx=limσ+a+12πσex2/2σ2dx.

A piece of the right tail of a normal distribution, P(X|σ), for increasing values of σ.

Here again there is a wave traveling to the right (and a symmetric one traveling to the left, not shown here).

Because the wave travels with a finite speed, we may suspect non-uniform convergence: no matter how large σ is, there are always distant enough points at which the density will temporarily increase as σ continues to increase.

And yet, the height of the wave diminishes, and 12πσex2/2σ2 does converge to 0 uniformly:

supx|12πσex2/2σ2|=12πσ0(σ+).

The reason why this is not enough to bring the limit under the integral is that here we are dealing with an improper integral. Improper integrals are themselves limits; recall that the improper integral is defined as

a+12πσex2/2σ2dx=limb+ab12πσex2/2σ2dx.

Before we get to bring limσ+ under ab, we need to sneak it past limb+ first. For that, the limit limb+ab12πσex2/2σ2dx itself needs to converge uniformly as a function of σ; i.e. supσ|b+12πσex2/2σ2dx|0(b+).

Look, this is the same quantity that we started with: because the integral is a positive and increasing function of σ,

supσ|b+12πσex2/2σ2dx|=limσ+b+12πσex2/2σ2dx=limσ+P(X>b|σ).

If we assume that blimσ+P(X>b|σ)=0, then the integral converges uniformly, we can bring the limit under the integral, and prove that indeed, alimσ+P(X>a|σ)=0.

But there is no reason to believe that the limit is zero, and that explains why our calculation yielding 0 was wrong: the integral does not converge uniformly.

The actual calculation of limσ+P(X>a|σ) is simple: because a>0limσ+P(aXa|σ)=0 (thanks to uniform convergence!), all the probability mass goes to infinity, and because of the symmetry, each tail gets a half of it; so

limσ+P(X>a|σ)=1/2.

But I thought it would be instructive to investigate why the naive calculation did not work, so here we are.