system2 in R considered inadequate

Published on

When you need to run another program (executable) from your program, there are two different ways to go about it: to run the executable directly, or to go through the shell.

“Going through the shell” means that instead of running the executable you want to run, your program will run the shell, /bin/sh, which in turn will be told (via the -c option) to run the executable you want.

Why would one want to go through the shell? There are a couple of advantages:

  1. When going through the shell, you can use a familiar syntax to write the arguments, e.g. system("cp -r dir1 dir2"). On the other hand, when running the executable directly, you have to split the command into the executable name (cp) and the array of arguments "-r", "dir1", "dir2".
  2. You get for free the various shell features: input/output redirection, pipes, variable expansions etc.

But there are downsides, too:

  1. There is a bit of overhead for launching the shell and having the shell parse the command and do other housekeeping. Often this overhead is negligible compared to all the other stuff that your program is doing, but of course it depends on what exactly it’s doing and how many times it has to run the shell/executable.

  2. More importantly, now you have to worry about how exactly the shell will parse your command. If dir1 and dir2 are variables, you might construct your shell command as

    paste("cp -r", dir1, dir2)

    (R’s paste() function will by default add spaces between its arguments.)

    But what if dir1 or dir2 contain spaces or other special shell characters?

For these reasons, my rule of thumb is to always run the executable directly, unless there’s a specific reason why a shell is needed, which in my experience is very rare.

Most programming languages offer both options:

  1. The C standard defines the system function, which executes a command through the command processor (i.e. the shell on POSIX systems). On POSIX systems, one can instead execute the command directly by using fork and execvp (or another function from the exec* family).

  2. In Python, there’s a boolean shell parameter to the subprocess.run() function that specifies whether the call should go through the shell.

  3. In Haskell’s System.Process, there’s callCommand, which goes through the shell, and there’s callProcess, which doesn’t.

  4. In Perl, there’s just one function, system, which does both things depending on how it’s invoked. If it gets more than one argument, it runs the executable directly, treating the first argument as the executable name and the rest as arguments for the executable.

    If there’s just one argument to system, then it’s considered as a shell command.

    This is still less explicit than I prefer (what if you need to run an executable without arguments, but the executable name may contains spaces or special characters?), but it’s at least something.

Now on to R. In R, there are two functions, system and system2. system takes the command as a single string, so it makes sense that it would go through the shell. system2 is described as “a more portable and flexible interface than ‘system’”, and it takes the command and the list of arguments as separate arguments:

   system2(command, args, ...)

So one might hope that it doesn’t go through the shell. Sadly, not only does it goes through the shell, it doesn’t even attempt to quote the args list so that it corresponds to the arguments actually received by the command. A single argument in args may end up as several arguments to command — or even vice versa!

> system2("printf", c("%s\\\\n" , "a b"))
a
b
> system2("printf", c("%s\\\\n" , "'", "a", "b", "'"))
 a b 

Looking at the source code for system2, the argument list is simply concatenated with spaces — what’s the point of having it as a list at all?

system2 <- function(command, args = character(), ...) {
    ...
    command <- paste(c(env, shQuote(command), args), collapse = " ")
    ...
    .Internal(system(command, intern, timeout))
}

As far as I can tell, there’s currently no standard way in R to run an executable directly, bypassing the shell, which is a shame. In order to avoid issues with spaces and special characters in arguments, you have to quote each one using shQuote:

> system2("printf", c("%s\\\\n" , shQuote("a b")))
a b