# Algebra of permutations in R

Published on

A permutation is a mathematical function that maps numbers from the set \(\{1,2,\ldots,n\}\) to other numbers from the same set.

An example of a permutation for \(n=5\) would be

\[ s=\begin{pmatrix} 1 & 2 & 3 & 4 & 5 \\ 2 & 5 & 4 & 3 & 1 \end{pmatrix} \]

This two-line notation denotes a function \(s\) such that \(s(1)=2\), \(s(2)=5\) and so on.

In R, we can represent a permutation by its second row, and assume that the first row implicitly consists of the numbers from \(1\) to \(n\) in the ascending order:

`s <- c(2, 5, 4, 3, 1)`

An R vector represents a permutation if and only if it consists of `n`

integers between `1`

and `n`

, and each integer occurs in the vector exactly once.

Here is how the mathematics of permutations maps to R:

- The application \(s(i)\) (where \(i\in \{1,\ldots,n\}\)) corresponds to
`s[i]`

. - The composition of permutations \(s\) and \(t\), \(s \circ t\), corresponds to the vector
`s[t]`

. - the identity permutation, \(\mathrm{id}\), corresponds to the vector
`1:n`

, because`s[1:n] == s`

. - The inverse of a permutation \(s\), \(s^{-1}\), corresponds to the vector
`order(s)`

. This is because`s[order(s)] == 1:s`

. `sample(n)`

gives a random permutation of size`n`

.

The main use for permutations in R is to rearrange the data. If `s`

is a permutation vector and `d`

is an arbitrary vector (i.e. its elements are not necessarily numbers from `1`

to `n`

), then `d[s]`

is some rearrangement of `d`

. Specifically, `d[s][i] == d[s[i]]`

, so the `i`

th element of `d[s]`

is the `s[i]`

th element of `d`

.

Often we need to solve the reverse problem: we need to move the `i`

th element of `d`

to the position `s[i]`

in the resulting vector. Because the inverse permutation of `s`

is `order(s)`

, `d[order(s)]`

is exactly what we need.

As I explain in rank vs order in R, `rank(a)`

and `order(a)`

are inverse permutations of each other. But above I said that `order(s)`

is the inverse of `s`

itself. It follows, therefore, that when `s`

is a permutation vector, `rank(s)`

coincides with `s`

itself:

`[1] TRUE`

## Example: sort monthly data

As an example, let’s use the algebra of permutations to put some monthly data in the chronological order:

```
Oct Mar Jul Aug Sep Nov Dec Apr Jun Jan May Feb
50 47 48 53 42 56 61 59 56 57 50 58
```

The proper chronological order is given by the `month.abb`

vector:

` [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"`

Because our goal is to rearrange the vector `v`

, the solution will have the form `v[order(s)]`

, where the permutation `s`

specifies, for each index `i`

, that the `i`

th element should be moved to the `s[i]`

th position.

Our strategy will be to build `s`

as a composition of two permutations, `x[y]`

, where `y`

transforms our random order into the alphabetical order of month names, and `x`

then transforms the alphabetical order to the chronological order. (Remember that `x[y][i] == x[y[i]]`

, so `y`

is the permutation that gets applied first.)

We need `y`

to send the month name `names(v)[i]`

to its correct position in the alphabetically-sorted vector. This position is given by `rank(names(v))[i]`

:

Let’s now consider `x`

. It must send an index `i`

in the alphabetically sorted vector to the corresponding index in the chronological vector. `rank(month.abb)`

sends indexes from the chronological vector to the alphabetical vector, so `x`

must be the inverse of that:

Let’s see if we got `x`

and `y`

right:

```
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
57 58 47 59 50 56 48 53 42 50 56 61
```

Yay!

Now, let’s simplify our code. Let’s denote `rank(names(v))`

as \(r_1\) and `rank(month.abb)`

as \(r_2\), and recall that `order`

of a permutation is its inverse.

Then \(y\) is \(r_1\), \(x\) is \(r_2^{-1}\), and `order(s)`

is

\[ s^{-1} = (xy)^{-1} = (r_2^{-1} r_1)^{-1} = r_1^{-1} r_2. \]

Which, when translated back to R, corresponds to the permutation

Can we also simplify `order(rank(names(v)))`

? You may remember that `order`

and `rank`

are inverse of each other and cancel them out. That would be a mistake. For any `a`

(without duplicate elements), `order(a)`

and `rank(a)`

are inverse *permutations* (`order(a)[rank(a)] == 1:length(a)`

); however, `order`

and `rank`

are not inverse *functions*. It turns out that `order(rank(a))`

is simply `order(a)`

, because `rank`

preserves the order of the elements.

Our final version is therefore

One might arrive at this result via an easier route, perhaps observing that `v[order(names(v))]`

gives the alphabetically sorted vector and `[rank(month.abb)]`

then restores the chronological order, but it may be less obvious in more complicated problems.

Also, as I show in rank vs order, if you do this by trial and error and use small vectors to test your code, you can easily arrive at something that looks right but isn’t. Let’s say your data had only 4 months worth of data:

```
set.seed(2)
month.abb <- month.abb[1:4]
v <- rpois(4, 50)
names(v) <- sample(month.abb)
v
```

```
Jan Mar Apr Feb
43 43 42 50
```

Then this may look like the right solution (it’s not!):

```
Jan Feb Mar Apr
43 50 43 42
```

So here’s how I recommend to approach such rearrangement problems:

- Start with
`v[order(s)]`

, transforming the problem into “where do I send the`i`

th element” - Find or derive
`s`

- Optionally, simplify
`order(s)`

using the algebra of permutations