Is differentiability arbitrary?

Published on May 1, 2018

A function \(f\) is called differentiable at point \(x_0\) if it can be approximated by a linear function near \(x_0\).

More formally, \(f\) is differentiable at \(x_0\) iff for some number \(A\) and for all \(x\)

\[ f(x)=f(x_0)+A\cdot(x-x_0) + o(x-x_0). \]

Here \(f(x_0)+A\cdot(x-x_0)\) is the linear function of \(x\) that approximates \(f(x)\) near \(x_0\), and \(o(x-x_0)\) is the approximation error — a function of \(x\) such that \(\frac{o(x-x_0)}{x-x_0}\) tends to 0 when \(x-x_0\) tends to 0.

There is something arbitrary about this definition. Why are we approximating \(f\) by a linear function, and not by the square function \(x^2\), or the square root function \(\sqrt{x}\), or the sine function \(\sin x\)?

In my experience, this is almost never explained by high school teachers or university professors who introduce differentiability to students. At best they may say that linear functions play a very important role in mathematics which, while true, is just begging the question.

Approximating by the square function

If a function can be approximated by the square function, i.e.

\[ f(x)=f(x_0)+B\cdot(x-x_0)^2 + o((x-x_0)^2), \]

then it can also be appoximated by a linear function — or, more specifically, a constant function \(f(x_0)\). This is because \((x-x_0)^2=o(x-x_0)\), so the whole term \(B\cdot(x-x_0)^2 + o((x-x_0)^2)\) can put into the approximation error \(o(x-x_0)\) in the usual definition, and \(A\) can be set to zero.

Conversely, if a function is differentiable and its derivative is zero, then it can be approximated by the quadratic function.

Approximating by the square root function

Consider a function that can be approximated by the square root function:

\[ f(x)=f(x_0)+B\cdot\sqrt{x-x_0} + o(\sqrt{x-x_0}). \]

An example would be the square function itself, \(f(x)=\sqrt{x}\), near \(x_0=0\).

But is this property “sustainable”? It can hold at a single point \(x_0\), but can it hold for all points in an interval in the same way as a function can be differentiable at all points of an interval?

Suppose that

\[ f(x)=f(x_0)+B(x_0)\cdot\sqrt{x-x_0} + o(\sqrt{x-x_0}) \]

for all \(x_0\) in \([a,b]\). Pick an integer \(M>0\) and divide \([a,b]\) into \(M\) equal sub-intervals \[\left[a+(b-a)\cdot \frac{m}{M}, a+(b-a)\cdot \frac{m+1}M\right],\] \(0\leq m\leq M-1\).

Then

\[ \begin{split} f(b)-f(a) & =\sum_{m=0}^{M-1} f\left(a+(b-a)\cdot \frac{m+1}M\right)-f\left(a+(b-a)\cdot \frac{m}{M}\mathstrut\right) \\ & =\sum_{m=0}^{M-1} B\left(a+(b-a)\cdot \frac{m}M\right)\sqrt{\frac{b-a}M} + o(1/M). \end{split} \]

You can already notice that something is strange. Suppose that \(B(x)\) does not depend on \(x\), \(B(x)\equiv B\). Then the right-hand side above reduces to \(B\sqrt{M(b-a)}+o(1/M)\) and goes to infinity as \(M\) increases; it cannot possibly remain equal to \(f(b)-f(a)\).

More generally (but for the same reason), there can’t be any point on \([a,b]\) where \(B\) is continuous and not equal to 0, or any sub-interval of \([a,b]\) where \(B(x)>\epsilon>0\).

So this approximation is rather weird unless \(B(x)\equiv 0\).

Approximating by the sine function

What happens if we try to approximate by, say, \(\sin x\)?

\[ f(x)=f(x_0)+B\cdot\sin(x-x_0) + o(\sin(x-x_0)). \]

To be careful, we need to restrict this approximation to \(x\) in some neighborhood of \(x_0\). We didn’t have to do this before because when \(x\) is not close to \(x_0\), \(o((x-x_0)^{\alpha})\) can be anything at all. Here, however, \(\sin(x-x_0)\) can be small even if \(x-x_0\) is not small.

We know that \(\sin x\) is itself differentiable in the usual sense, its derivative at 0 being 1:

\[ \sin (x-x_0)=(x-x_0) + o(x-x_0). \]

Therefore,

\[ f(x)=f(x_0)+B\cdot(x-x_0) + o(x-x_0), \]

i.e. \(f\) is differentiable in the usual sense as well, and vice versa.

Conclusion

Generalizing from these examples, let’s say that we approximate a function \(f\) with a function \(g\) in the neighborhood of \(x_0\):

\[ f(x)=f(x_0)+B\cdot g(x-x_0) + o(g(x-x_0)), \]

where \(g(0)=0\) so that \(o(g(x-x_0))\) has the usual meaning.

Then:

If \(g\) is itself differentiable at 0 and \(g'(0)\neq 0\) (as in \(g(x)=\sin x\)), we get the usual class of differentiable functions.
If \(g\) is differentiable at 0 and \(g'(0)=0\) (as in \(g(x)=x^2\)), then we get a subclass of differentiable functions for which the derivative at \(x_0\) is 0.
If \(g\) itself is not differentiable at 0 (as in \(g(x)=\sqrt x\)), then this property will hold either at an isolated point or describe some strange (non-smooth) functions.

This doesn’t quite tell us why we pick \(x\) instead of \(\sin x\) — the answer would be some hand-wavy simplicity arguments — but at least it reassures us that we are not missing much by focusing on linear approximations.