Algebra of permutations in R
Published on
A permutation is a mathematical function that maps numbers from the set \(\{1,2,\ldots,n\}\) to other numbers from the same set.
An example of a permutation for \(n=5\) would be
\[ s=\begin{pmatrix} 1 & 2 & 3 & 4 & 5 \\ 2 & 5 & 4 & 3 & 1 \end{pmatrix} \]
This two-line notation denotes a function \(s\) such that \(s(1)=2\), \(s(2)=5\) and so on.
In R, we can represent a permutation by its second row, and assume that the first row implicitly consists of the numbers from \(1\) to \(n\) in the ascending order:
s <- c(2, 5, 4, 3, 1)
An R vector represents a permutation if and only if it consists of
n
integers between 1
and n
, and
each integer occurs in the vector exactly once.
Here is how the mathematics of permutations maps to R:
- The application \(s(i)\) (where
\(i\in \{1,\ldots,n\}\)) corresponds to
s[i]
. - The composition of permutations \(s\) and \(t\), \(s \circ
t\), corresponds to the vector
s[t]
. - the identity permutation, \(\mathrm{id}\), corresponds to the vector
1:n
, becauses[1:n] == s
. - The inverse of a permutation \(s\),
\(s^{-1}\), corresponds to the vector
order(s)
. This is becauses[order(s)] == 1:s
. sample(n)
gives a random permutation of sizen
.
The main use for permutations in R is to rearrange the data. If
s
is a permutation vector and d
is an
arbitrary vector (i.e. its elements are not necessarily numbers from
1
to n
), then d[s]
is some
rearrangement of d
. Specifically,
d[s][i] == d[s[i]]
, so the i
th element of
d[s]
is the s[i]
th element of
d
.
Often we need to solve the reverse problem: we need to move the
i
th element of d
to the position
s[i]
in the resulting vector. Because the inverse
permutation of s
is order(s)
,
d[order(s)]
is exactly what we need.
As I explain in rank
vs. order in R, rank(a)
and order(a)
are
inverse permutations of each other. But above I said that
order(s)
is the inverse of s
itself. It
follows, therefore, that when s
is a permutation vector,
rank(s)
coincides with s
itself:
<- sample(100)
s all(rank(s) == s)
[1] TRUE
Example: sort monthly data
As an example, let’s use the algebra of permutations to put some monthly data in the chronological order:
set.seed(2017-11-18)
<- rpois(12, 50)
v names(v) <- sample(month.abb)
v
Oct Mar Jul Aug Sep Nov Dec Apr Jun Jan May Feb
50 47 48 53 42 56 61 59 56 57 50 58
The proper chronological order is given by the month.abb
vector:
[1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"
Because our goal is to rearrange the vector v
, the
solution will have the form v[order(s)]
, where the
permutation s
specifies, for each index i
,
that the i
th element should be moved to the
s[i]
th position.
Our strategy will be to build s
as a composition of two
permutations, x[y]
, where y
transforms our
random order into the alphabetical order of month names, and
x
then transforms the alphabetical order to the
chronological order. (Remember that x[y][i] == x[y[i]]
, so
y
is the permutation that gets applied first.)
We need y
to send the month name
names(v)[i]
to its correct position in the
alphabetically-sorted vector. This position is given by
rank(names(v))[i]
:
<- rank(names(v)) y
Let’s now consider x
. It must send an index
i
in the alphabetically sorted vector to the corresponding
index in the chronological vector. rank(month.abb)
sends
indexes from the chronological vector to the alphabetical vector, so
x
must be the inverse of that:
<- order(rank(month.abb)) x
Let’s see if we got x
and y
right:
<- x[y]
s order(s)] v[
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
57 58 47 59 50 56 48 53 42 50 56 61
Yay!
Now, let’s simplify our code. Let’s denote
rank(names(v))
as \(r_1\)
and rank(month.abb)
as \(r_2\), and recall that order
of a permutation is its inverse.
Then \(y\) is \(r_1\), \(x\) is \(r_2^{-1}\), and order(s)
is
\[ s^{-1} = (xy)^{-1} = (r_2^{-1} r_1)^{-1} = r_1^{-1} r_2. \]
Which, when translated back to R, corresponds to the permutation
order(rank(names(v)))[rank(month.abb)]
Can we also simplify order(rank(names(v)))
? You may
remember that order
and rank
are inverse of
each other and cancel them out. That would be a mistake. For any
a
(without duplicate elements), order(a)
and
rank(a)
are inverse permutations
(order(a)[rank(a)] == 1:length(a)
); however,
order
and rank
are not inverse
functions. It turns out that order(rank(a))
is
simply order(a)
, because rank
preserves the
order of the elements.
Our final version is therefore
order(names(v))[rank(month.abb)]] v[
One might arrive at this result via an easier route, perhaps
observing that v[order(names(v))]
gives the alphabetically
sorted vector and [rank(month.abb)]
then restores the
chronological order, but it may be less obvious in more complicated
problems.
Also, as I show in rank vs. order, if you do this by trial and error and use small vectors to test your code, you can easily arrive at something that looks right but isn’t. Let’s say your data had only 4 months worth of data:
set.seed(2)
month.abb <- month.abb[1:4]
v <- rpois(4, 50)
names(v) <- sample(month.abb)
v
Jan Mar Apr Feb
43 43 42 50
Then this may look like the right solution (it’s not!):
rank(names(v))[rank(month.abb)]] v[
Jan Feb Mar Apr
43 50 43 42
So here’s how I recommend to approach such rearrangement problems:
- Start with
v[order(s)]
, transforming the problem into “where do I send thei
th element” - Find or derive
s
- Optionally, simplify
order(s)
using the algebra of permutations