Introduction
It is often useful in Ruby to start a sub-process to run a particular chunk of Ruby code. Perhaps you are trying to run two processes in parallel, and Ruby's green threading doesn't provide sufficient concurrency. Perhaps you are automating a set of scripts. Or perhaps you are trying to isolate some untrusted code while still getting information back from it.
Whatever the reason, Ruby provides a wealth of facilities for interacting with sub-processes, some better known than others. In this series of articles I will be focusing on running Ruby as a sub-process of Ruby, although many of the techniques I'll be demonstrating are applicable to running any type of program in a sub-process. I'll also be keeping the focus on UNIX-style platforms, such as Linux and Mac OS X. Sub-process handling on Windows differs significantly, and we'll leave that for another series.
In the first and second articles, I'll demonstrate some of the facilities for starting sub-processes that Ruby possesses out-of-the-box, no requires needed. In the third article we'll look at some tools provided in Ruby's Standard Library which build on the methods introduced in part one. And in the fourth instalment I'll briefly survey a few of the many Rubygems which simplify sub-process interactions.
Getting Started
To begin, let's define a few helper methods and constants which we'll refer back to throughout the series. First, let's define a simple method which will serve as our "slave" code - the code we want to execute in a sub-process. Here it is:
def hello(source, expect_input)
puts "[child] Hello from #{source}"
if expect_input
puts "[child] Standard input contains: \"#{$stdin.readline.chomp}\""
else
puts "[child] No stdin, or stdin is same as parent's"
end
$stderr.puts "[child] Hello, standard error"
end
(Note: The full source code for this article can be found at http://gist.github.com/137705)
This method prints a message to the standard output stream, a message to the standard error stream, and optionally reads and prints a message from the standard input stream. One of the things we'll be exploring in this series is the differing ways in which the various sub-process-starting methods handle standard I/O streams.
Next, let's define a couple of helpful constants.
require 'rbconfig' THIS_FILE = File.expand_path(__FILE__) RUBY = File.join(Config::CONFIG['bindir'], Config::CONFIG['ruby_install_name'])
The first, THIS_FILE, is simply the fully-qualified name of the file containing our demo source code. RUBY, the second constant, is set to the fully-qualified path of the running Ruby executable. These constants will come in handy with sub-process methods which require an explicit shell command to be run.
In order to make the order of events clearer, we'll force the standard output stream into synchronised mode. This will cause it to flush its buffer after every write.
$stdout.sync = true
Finally, we'll be surrounding all of the code which follows in the following protective IF-statement:
if $PROGRAM_NAME == __FILE__ # ... end
This will ensure that the demo code won't be re-executed when we require the source file within sub-processes.
Method #1: The Backtick Operator
The simplest way to execute a sub-process in Ruby is with the backtick (`). This method, which harks back to Bourne Shell scripting and Perl, is concise and often gives us exactly as much interaction as we need with a sub-process. The backtick, while it may look like a part of Ruby's core syntax, is technically an operator defined by Kernel. Like most Ruby operators it can be redefined in your own code, although that's beyond the scope of this article. Kernel defines the backtick operator as a method which executes its argument in a subshell.
puts "1. Backtick operator"
output = `#{RUBY} -r#{THIS_FILE} -e'hello("backticks", false)'`
output.split("\n").each do |line|
puts "[parent] output: #{line}"
end
puts
Here, we use backticks to execute a child Ruby process which loads our demo source code and executes the hello method. This yields:
1. Backtick operator [child] Hello, standard error [parent] output: [child] Hello from backticks [parent] output: [child] No stdin, or stdin is same as parent's
The backtick operator doesn't return until the command has finished. The sub-process inherits its standard input and standard error streams from the parent process. The process' ending status is made available as a Process::Status object in the $? global (aka $CHILD_STATUS if the English library is loaded).
We can use the %x operator as an alternate syntax for backticks, which enables us to select arbitrary delimiters for the command string. E.g. %x{echo `which cowsay`}.
Method #2: Kernel#system
Kernel#system is similar to the backtick operator in operation, with one important difference. Where the backtick operator returns the STDOUT of the finished command, system returns a Boolean value indicating the success or failure of the command. If the command exits with a zero status (indicating success), system will return true. Otherwise it returns false.
puts "2. Kernel#system"
success = system(RUBY, "-r", THIS_FILE, "-e", 'hello("system()", false)')
puts "[parent] success: #{success}"
puts
This results in:
2. Kernel#system [child] Hello from system() [child] No stdin, or stdin is same as parent's [child] Hello, standard error [parent] success: true
Just like the backtick operator, system doesn't return until its process has exited, and leaves the process exit status in $?. The sub-process inherits the parent process' standard input, output, and error streams.
As we can see in the example above, when system() is given multiple arguments they are assembled into a single command for execution. This feature can make system() a little more convenient than backticks for executing complex commands. For this reason and because it's more visually apparent in the code, I prefer to use Kernel#system over backticks unless I need to capture the command's output. Note that there are some other ways system() can be called; see the Kernel#exec documentation for the details.
Method #3: Kernel#fork (aka Process.fork)
Ruby provides access to the *NIX fork() system call via Kernel#fork. On UNIX-like OSes, fork splits the currently executing Ruby process in two. Both processes run concurrently and independently from that point on. Unlike the methods we've examined so far, fork enables us to execute in-line Ruby code in a sub-process, rather than explicitly starting a new Ruby interpreter and telling it to load our code.
Traditionally we would need to put in some conditional code to examine the return value of fork and determine whether the code was executing in the parent or child process. Ruby makes it easy to specify what code should be run in the child by allowing us to pass a block to fork. The contents of the block will be run in the child process, after which it will exit. The parent will continue running at the point where the block ends.
puts "3. Kernel#fork"
pid = fork do
hello("fork()", false)
end
Process.wait(pid)
puts "[parent] pid: #{pid}"
puts
This produces the following output:
3. Kernel#fork [child] Hello from fork() [child] No stdin, or stdin is same as parent's [child] Hello, standard error [parent] pid: 19935
Note the call to Process.wait. Since the process spawned by fork runs concurrently with the parent process, we need to explicitly wait for the child process to finish if we want to synchronize with it. We use the child process ID, returned by fork, as the argument to Process.wait.
The sub-process inherits its standard error and output streams from the parent. Since fork is a *NIX-only syscall, it will only reliably work on UNIX-style systems.
Conclusion
In this first installment in the Ruby Sub-processes series we've looked at three of the simplest ways to start another Ruby process from inside a Ruby program. Stay tuned for part 2, in which we'll delve into some methods for doing more complex communication with spawned sub-processes.
