# Algebra of permutations in R

Published on

A permutation is a mathematical function that maps numbers from the set \(\{1,2,\ldots,n\}\) to other numbers from the same set.

An example of a permutation for \(n=5\) would be

\[ s=\begin{pmatrix} 1 & 2 & 3 & 4 & 5 \\ 2 & 5 & 4 & 3 & 1 \end{pmatrix} \]

This two-line notation denotes a function \(s\) such that \(s(1)=2\), \(s(2)=5\) and so on.

In R, we can represent a permutation by its second row, and assume that the first row implicitly consists of the numbers from \(1\) to \(n\) in the ascending order:

`s <- c(2, 5, 4, 3, 1)`

An R vector represents a permutation if and only if it consists of
`n`

integers between `1`

and `n`

, and
each integer occurs in the vector exactly once.

Here is how the mathematics of permutations maps to R:

- The application \(s(i)\) (where
\(i\in \{1,\ldots,n\}\)) corresponds to
`s[i]`

. - The composition of permutations \(s\) and \(t\), \(s \circ
t\), corresponds to the vector
`s[t]`

. - the identity permutation, \(\mathrm{id}\), corresponds to the vector
`1:n`

, because`s[1:n] == s`

. - The inverse of a permutation \(s\),
\(s^{-1}\), corresponds to the vector
`order(s)`

. This is because`s[order(s)] == 1:s`

. `sample(n)`

gives a random permutation of size`n`

.

The main use for permutations in R is to rearrange the data. If
`s`

is a permutation vector and `d`

is an
arbitrary vector (i.e. its elements are not necessarily numbers from
`1`

to `n`

), then `d[s]`

is some
rearrangement of `d`

. Specifically,
`d[s][i] == d[s[i]]`

, so the `i`

th element of
`d[s]`

is the `s[i]`

th element of
`d`

.

Often we need to solve the reverse problem: we need to move the
`i`

th element of `d`

to the position
`s[i]`

in the resulting vector. Because the inverse
permutation of `s`

is `order(s)`

,
`d[order(s)]`

is exactly what we need.

As I explain in rank
vs. order in R, `rank(a)`

and `order(a)`

are
inverse permutations of each other. But above I said that
`order(s)`

is the inverse of `s`

itself. It
follows, therefore, that when `s`

is a permutation vector,
`rank(s)`

coincides with `s`

itself:

```
<- sample(100)
s all(rank(s) == s)
```

`[1] TRUE`

## Example: sort monthly data

As an example, let’s use the algebra of permutations to put some monthly data in the chronological order:

```
set.seed(2017-11-18)
<- rpois(12, 50)
v names(v) <- sample(month.abb)
v
```

```
Oct Mar Jul Aug Sep Nov Dec Apr Jun Jan May Feb
50 47 48 53 42 56 61 59 56 57 50 58
```

The proper chronological order is given by the `month.abb`

vector:

` [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" "Dec"`

Because our goal is to rearrange the vector `v`

, the
solution will have the form `v[order(s)]`

, where the
permutation `s`

specifies, for each index `i`

,
that the `i`

th element should be moved to the
`s[i]`

th position.

Our strategy will be to build `s`

as a composition of two
permutations, `x[y]`

, where `y`

transforms our
random order into the alphabetical order of month names, and
`x`

then transforms the alphabetical order to the
chronological order. (Remember that `x[y][i] == x[y[i]]`

, so
`y`

is the permutation that gets applied first.)

We need `y`

to send the month name
`names(v)[i]`

to its correct position in the
alphabetically-sorted vector. This position is given by
`rank(names(v))[i]`

:

`<- rank(names(v)) y `

Let’s now consider `x`

. It must send an index
`i`

in the alphabetically sorted vector to the corresponding
index in the chronological vector. `rank(month.abb)`

sends
indexes from the chronological vector to the alphabetical vector, so
`x`

must be the inverse of that:

`<- order(rank(month.abb)) x `

Let’s see if we got `x`

and `y`

right:

```
<- x[y]
s order(s)] v[
```

```
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
57 58 47 59 50 56 48 53 42 50 56 61
```

Yay!

Now, let’s simplify our code. Let’s denote
`rank(names(v))`

as \(r_1\)
and `rank(month.abb)`

as \(r_2\), and recall that `order`

of a permutation is its inverse.

Then \(y\) is \(r_1\), \(x\) is \(r_2^{-1}\), and `order(s)`

is

\[ s^{-1} = (xy)^{-1} = (r_2^{-1} r_1)^{-1} = r_1^{-1} r_2. \]

Which, when translated back to R, corresponds to the permutation

`order(rank(names(v)))[rank(month.abb)]`

Can we also simplify `order(rank(names(v)))`

? You may
remember that `order`

and `rank`

are inverse of
each other and cancel them out. That would be a mistake. For any
`a`

(without duplicate elements), `order(a)`

and
`rank(a)`

are inverse *permutations*
(`order(a)[rank(a)] == 1:length(a)`

); however,
`order`

and `rank`

are not inverse
*functions*. It turns out that `order(rank(a))`

is
simply `order(a)`

, because `rank`

preserves the
order of the elements.

Our final version is therefore

`order(names(v))[rank(month.abb)]] v[`

One might arrive at this result via an easier route, perhaps
observing that `v[order(names(v))]`

gives the alphabetically
sorted vector and `[rank(month.abb)]`

then restores the
chronological order, but it may be less obvious in more complicated
problems.

Also, as I show in rank vs. order, if you do this by trial and error and use small vectors to test your code, you can easily arrive at something that looks right but isn’t. Let’s say your data had only 4 months worth of data:

```
set.seed(2)
month.abb <- month.abb[1:4]
v <- rpois(4, 50)
names(v) <- sample(month.abb)
v
```

```
Jan Mar Apr Feb
43 43 42 50
```

Then this may look like the right solution (it’s not!):

`rank(names(v))[rank(month.abb)]] v[`

```
Jan Feb Mar Apr
43 50 43 42
```

So here’s how I recommend to approach such rearrangement problems:

- Start with
`v[order(s)]`

, transforming the problem into “where do I send the`i`

th element” - Find or derive
`s`

- Optionally, simplify
`order(s)`

using the algebra of permutations