system2 in R considered inadequate
Published on
When you need to run another program (executable) from your program, there are two different ways to go about it: to run the executable directly, or to go through the shell.
“Going through the shell” means that instead of running the
executable you want to run, your program will run the shell,
/bin/sh
, which in turn will be told (via the
-c
option) to run the executable you want.
Why would one want to go through the shell? There are a couple of advantages:
- When going through the shell, you can use a familiar syntax to write
the arguments, e.g.
system("cp -r dir1 dir2")
. On the other hand, when running the executable directly, you have to split the command into the executable name (cp
) and the array of arguments"-r", "dir1", "dir2"
. - You get for free the various shell features: input/output redirection, pipes, variable expansions etc.
But there are downsides, too:
There is a bit of overhead for launching the shell and having the shell parse the command and do other housekeeping. Often this overhead is negligible compared to all the other stuff that your program is doing, but of course it depends on what exactly it’s doing and how many times it has to run the shell/executable.
More importantly, now you have to worry about how exactly the shell will parse your command. If
dir1
anddir2
are variables, you might construct your shell command aspaste("cp -r", dir1, dir2)
(R’s
paste()
function will by default add spaces between its arguments.)But what if
dir1
ordir2
contain spaces or other special shell characters?
For these reasons, my rule of thumb is to always run the executable directly, unless there’s a specific reason why a shell is needed, which in my experience is very rare.
Most programming languages offer both options:
The C standard defines the
system
function, which executes a command through the command processor (i.e. the shell on POSIX systems). On POSIX systems, one can instead execute the command directly by usingfork
andexecvp
(or another function from theexec*
family).In Python, there’s a boolean
shell
parameter to thesubprocess.run()
function that specifies whether the call should go through the shell.In Haskell’s
System.Process
, there’scallCommand
, which goes through the shell, and there’scallProcess
, which doesn’t.In Perl, there’s just one function,
system
, which does both things depending on how it’s invoked. If it gets more than one argument, it runs the executable directly, treating the first argument as the executable name and the rest as arguments for the executable.If there’s just one argument to
system
, then it’s considered as a shell command.This is still less explicit than I prefer (what if you need to run an executable without arguments, but the executable name may contains spaces or special characters?), but it’s at least something.
Now on to R. In R, there are two functions, system
and
system2
. system
takes the command as a single
string, so it makes sense that it would go through the shell.
system2
is described as “a more portable and flexible
interface than ‘system’”, and it takes the command and the list of
arguments as separate arguments:
system2(command, args, ...)
So one might hope that it doesn’t go through the shell. Sadly, not
only does it goes through the shell, it doesn’t even attempt to quote
the args list so that it corresponds to the arguments actually received
by the command. A single argument in args
may end up as
several arguments to command
— or even vice versa!
> system2("printf", c("%s\\\\n" , "a b"))
a
b
> system2("printf", c("%s\\\\n" , "'", "a", "b", "'"))
a b
Looking at the source code for system2
, the argument
list is simply concatenated with spaces — what’s the point of having it
as a list at all?
system2 <- function(command, args = character(), ...) {
...
command <- paste(c(env, shQuote(command), args), collapse = " ")
...
.Internal(system(command, intern, timeout))
}
As far as I can tell, there’s currently no standard way in R to run
an executable directly, bypassing the shell, which is a shame. In order
to avoid issues with spaces and special characters in arguments, you
have to quote each one using shQuote
:
> system2("printf", c("%s\\\\n" , shQuote("a b")))
a b