How to pass a logical matrix to RStan

Published on

I ran into this problem with RStan today, and I’m sure I’ve had it at least once in the past, and it took me a ridiculously long time to figure out, so I’m documenting it here.

Suppose you need to pass a multidimensional logical array to RStan (events in the example below). The details of the model don’t matter; let’s say it’s a simple beta-binomial model:

model <- stan_model(auto_write = T, model_code = "
data {
  int<lower=1> m;
  int<lower=1> n;
  int<lower=0,upper=1> events[m,n];
}
parameters {
  real<lower=0> a;
  real<lower=0> b;
  vector<lower=0,upper=1>[m] theta;
}
model {
  theta ~ beta(a,b);
  for (i in 1:m) {
    sum(events[i]) ~ binomial(n, theta[i]);
  }
}
")

The corresponding R code might look like this:

library(rstan)

model <- stan_model("model.stan", auto_write = T)
a <- 3
b <- 5
n <- 20
m <- 30
events <- t(sapply(1:m,
  function(i) {
    p <- rbeta(1, a, b)
    as.logical(rbinom(n, 1, p))
  }))

sampling(model, data = list(m = m, n = n, events = events))

But this results in the following error:

Error in new_CppObject_xp(fields$.module, fields$.pointer, ...) : 
  Exception: mismatch in number dimensions declared and found in context; processing stage=data initialization; variable name=events; dims declared=(30,20); dims found=(600)  (in 'model3e955026c5b4_model' at line 4)
## failed to create the sampler; sampling not done

The important bits are declared=(30,20); dims found=(600). It looks like the events matrix got flattened… but why? We can double check that events is still a matrix of the correct dimensions:

class(events)
## [1] "matrix"
dim(events)
## [1] 30 20

Here’s what’s going on. Because events in R is a logical (i.e. boolean) matrix, and events in Stan is an integer two-dimensional array, RStan tries to convert the former to the latter by calling as.integer. And indeed, we can check that as.integer flattens the matrix.

So how do you convert a logical matrix to an integer one? There is certainly more than one way to do it, but a quick and dirty solution, which I found in the archives of r-help, is to add a 0 to it. So:

sampling(model, data = list(m = m, n = n, events = events + 0))