Torsors, midpoints, and homogeneous coordinates
Published on
Gustavo Goretkin writes:
Dear Roman,
I’ve thought about torsors for a while but only came across your page on it today.
Regarding the distinction between Vectors and Points, I think it’s useful to have this distinction in programming languages. Graphics people sometimes do this when they use normalized homogeneous coordinates. A point \(P(x,y,z)\) is represented as \([x,y,z,1]\) and a vector \(V(x,y,z)\) is represented as \([x,y,z,0]\). If you add a \(V+V\), you get a \(V\), if you add \(P+V\), you get a \(P\). If you do \(P-P\), you get a \(V\). Finally, If you add \(P+P\), you get a homogeneous coordinate with last coordinate 2.
Now, I don’t know how to attach a geometric meaning to the point \(P_1+P_2\), however, I think it is convenient to allow the average of \(P_1\) and \(P_2\) to be the point in between \(P_1\) and \(P_2\)—halfway along the geodesic connecting the two points. Which is what happens if you normalize the homogeneous point above, because you divide all coordinates by two.
If I were using a programming language that did not allow me to add two points, I know what I’d do. I’d choose an arbitrary point \(c\) and perform
\[\frac{(P_1-c) + (P_2-c)}2 + c.\]
This feels like a dilemma, though, because this produces the identical result as if you had allowed \(\frac{P_1+P_2}2\). Perhaps my dilemma is purely rooted in pragmatics, but I wanted to know if you had any thoughts.
Is this even legal?
So we have two ways to compute the midpoint of \(P_1P_2\). One is legal under the points-as-a-torsor interpretation and does not involve point addition:
\[\frac{(P_1-c) + (P_2-c)}2 + c,\]
where \(c\) is arbitrary. A simpler (and still legal) formula is obtained by letting \(c=P_1\):
\[\frac{P_2-P_1}2 + P_1.\]
On the other hand, we have a formula that “just works” if we allow point addition:
\[\frac{P_1+P_2}2.\]
Should the existence of this latter formula cast a shadow of doubt on the first two and the concept of a midpoint? Certainly not!
Why does the illegal formula work? Any torsor can be transformed into a proper linear space by choosing an arbtrary element of the torsor, \(c\), and designating it as zero. Then we can compute any linear combination \(\alpha P_1+\beta P_2\) just as we computed the midpoint above, \(\alpha(P_1-c) + \beta(P_2 - c) + c.\)
The problem with this approach, in general, is that the result will vary with the choice of \(c\). Indeed,
\[ (\alpha(P_1-c_1) + \beta(P_2 - c_1) + c_1) - \\ (\alpha(P_1-c_2) + \beta(P_2 - c_2) + c_2) = \\ (\alpha+\beta-1)(c_2-c_1) \]
Thus, the result will be independent of \(c\) only in the case \(\alpha+\beta=1\), that is, when the linear combination is affine. Luckily, computing the midpoint happens to be such a combination, with \(\alpha=\beta=1/2\).
Interface vs. implementation
If we wish to design a safer language that does not let us accidentally add points, we should not allow expressions like \(\alpha P_1 + \beta P_2\), unless we can statically verify that \(\alpha + \beta = 1\). But since, as we’ve established, an affine combination of two (and, by extension, \(n\)) points does make geometric sense, we would want to introduce a function like the following:
\[ \mathrm{affine}(\alpha_1,P_1;\alpha_2,P_2;\ldots;\alpha_n,P_n) = \frac{\alpha_1 P_1+\alpha_2 P_2+\ldots+\alpha_n P_n}{\alpha_1 +\alpha_2 +\ldots+\alpha_n} \label{affine} \]
The division by \(\sum \alpha_i\) ensures that the combination is indeed affine and thus legal.
Internally, the function may be computed by any of the formulae shown above, even if some of them may be not allowed in the surface language. Usually, the straightforward definition above will be the fastest. A smarter algorithm could be employed to achieve better numeric stability. And in rare cases such as dealing with bounded numbers, the formula \(\frac{P_2-P_1}2 + P_1\) is required to avoid overflow.
Homogeneous coordinates
It may seem as if our \(\mathrm{affine}\) function is equivalent to addition in homogeneous coordinates; but there is one important distinction. Homogeneous coordinates are stateful; the sum \(\sum\alpha_i\) is simply stored along the vector \(\sum\alpha_iP_i\), and the division is implicit. Thus two geometrically identical points may behave differently when added to a third point:
\[ [6,6,6,1] + [0,0,0,1] = [6,6,6,2] \sim (3,3,3) \\ [12,12,12,2] + [0,0,0,1] = [12,12,12,3] \sim (4,4,4) \]
Homogeneous coordinates are a very powerful concept, but as a programming model they are error-prone. A safer model could be devised if the language can statically distinguish (say, through types) between points, vectors, and possibly denormalized points.