Algebra of permutations in R

Published on

A permutation is a mathematical function that maps numbers from the set \(\{1,2,\ldots,n\}\) to other numbers from the same set.

An example of a permutation for \(n=5\) would be

\[ s=\begin{pmatrix} 1 & 2 & 3 & 4 & 5 \\ 2 & 5 & 4 & 3 & 1 \end{pmatrix} \]

This two-line notation denotes a function \(s\) such that \(s(1)=2\), \(s(2)=5\) and so on.

In R, we can represent a permutation by its second row, and assume that the first row implicitly consists of the numbers from \(1\) to \(n\) in the ascending order:

s <- c(2, 5, 4, 3, 1)

An R vector represents a permutation if and only if it consists of n integers between 1 and n, and each integer occurs in the vector exactly once.

Here is how the mathematics of permutations maps to R:

The main use for permutations in R is to rearrange the data. If s is a permutation vector and d is an arbitrary vector (i.e. its elements are not necessarily numbers from 1 to n), then d[s] is some rearrangement of d. Specifically, d[s][i] == d[s[i]], so the ith element of d[s] is the s[i]th element of d.

Often we need to solve the reverse problem: we need to move the ith element of d to the position s[i] in the resulting vector. Because the inverse permutation of s is order(s), d[order(s)] is exactly what we need.

As I explain in rank vs. order in R, rank(a) and order(a) are inverse permutations of each other. But above I said that order(s) is the inverse of s itself. It follows, therefore, that when s is a permutation vector, rank(s) coincides with s itself:

s <- sample(100)
all(rank(s) == s)
[1] TRUE

Example: sort monthly data

As an example, let’s use the algebra of permutations to put some monthly data in the chronological order:

set.seed(2017-11-18)
v <- rpois(12, 50)
names(v) <- sample(month.abb)
v
Oct Mar Jul Aug Sep Nov Dec Apr Jun Jan May Feb 
 50  47  48  53  42  56  61  59  56  57  50  58 

The proper chronological order is given by the month.abb vector:

 [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"

Because our goal is to rearrange the vector v, the solution will have the form v[order(s)], where the permutation s specifies, for each index i, that the ith element should be moved to the s[i]th position.

Our strategy will be to build s as a composition of two permutations, x[y], where y transforms our random order into the alphabetical order of month names, and x then transforms the alphabetical order to the chronological order. (Remember that x[y][i] == x[y[i]], so y is the permutation that gets applied first.)

We need y to send the month name names(v)[i] to its correct position in the alphabetically-sorted vector. This position is given by rank(names(v))[i]:

y <- rank(names(v))

Let’s now consider x. It must send an index i in the alphabetically sorted vector to the corresponding index in the chronological vector. rank(month.abb) sends indexes from the chronological vector to the alphabetical vector, so x must be the inverse of that:

x <- order(rank(month.abb))

Let’s see if we got x and y right:

s <- x[y]
v[order(s)]
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 
 57  58  47  59  50  56  48  53  42  50  56  61 

Yay!

Now, let’s simplify our code. Let’s denote rank(names(v)) as \(r_1\) and rank(month.abb) as \(r_2\), and recall that order of a permutation is its inverse.

Then \(y\) is \(r_1\), \(x\) is \(r_2^{-1}\), and order(s) is

\[ s^{-1} = (xy)^{-1} = (r_2^{-1} r_1)^{-1} = r_1^{-1} r_2. \]

Which, when translated back to R, corresponds to the permutation

order(rank(names(v)))[rank(month.abb)]

Can we also simplify order(rank(names(v)))? You may remember that order and rank are inverse of each other and cancel them out. That would be a mistake. For any a (without duplicate elements), order(a) and rank(a) are inverse permutations (order(a)[rank(a)] == 1:length(a)); however, order and rank are not inverse functions. It turns out that order(rank(a)) is simply order(a), because rank preserves the order of the elements.

Our final version is therefore

v[order(names(v))[rank(month.abb)]]

One might arrive at this result via an easier route, perhaps observing that v[order(names(v))] gives the alphabetically sorted vector and [rank(month.abb)] then restores the chronological order, but it may be less obvious in more complicated problems.

Also, as I show in rank vs. order, if you do this by trial and error and use small vectors to test your code, you can easily arrive at something that looks right but isn’t. Let’s say your data had only 4 months worth of data:

set.seed(2)
month.abb <- month.abb[1:4]
v <- rpois(4, 50)
names(v) <- sample(month.abb)
v
Jan Mar Apr Feb 
 43  43  42  50 

Then this may look like the right solution (it’s not!):

v[rank(names(v))[rank(month.abb)]]
Jan Feb Mar Apr 
 43  50  43  42 

So here’s how I recommend to approach such rearrangement problems:

  1. Start with v[order(s)], transforming the problem into “where do I send the ith element”
  2. Find or derive s
  3. Optionally, simplify order(s) using the algebra of permutations