• A dozen (or so) ways to start sub-processes in Ruby: Part 1

    Introduction

    It is often useful in Ruby to start a sub-process to run a particular chunk of Ruby code. Perhaps you are trying to run two processes in parallel, and Ruby's green threading doesn't provide sufficient concurrency. Perhaps you are automating a set of scripts. Or perhaps you are trying to isolate some untrusted code while still getting information back from it.

    Whatever the reason, Ruby provides a wealth of facilities for interacting with sub-processes, some better known than others. In this series of articles I will be focusing on running Ruby as a sub-process of Ruby, although many of the techniques I'll be demonstrating are applicable to running any type of program in a sub-process. I'll also be keeping the focus on UNIX-style platforms, such as Linux and Mac OS X. Sub-process handling on Windows differs significantly, and we'll leave that for another series.

    In the first and second articles, I'll demonstrate some of the facilities for starting sub-processes that Ruby possesses out-of-the-box, no requires needed. In the third article we'll look at some tools provided in Ruby's Standard Library which build on the methods introduced in part one. And in the fourth instalment I'll briefly survey a few of the many Rubygems which simplify sub-process interactions.

    Getting Started

    To begin, let's define a few helper methods and constants which we'll refer back to throughout the series. First, let's define a simple method which will serve as our "slave" code - the code we want to execute in a sub-process. Here it is:

    def hello(source, expect_input)
      puts "[child] Hello from #{source}"
      if expect_input
        puts "[child] Standard input contains: \"#{$stdin.readline.chomp}\""
      else
        puts "[child] No stdin, or stdin is same as parent's"
      end
      $stderr.puts "[child] Hello, standard error"
    end
    


    (Note: The full source code for this article can be found at http://gist.github.com/137705)

    This method prints a message to the standard output stream, a message to the standard error stream, and optionally reads and prints a message from the standard input stream. One of the things we'll be exploring in this series is the differing ways in which the various sub-process-starting methods handle standard I/O streams.

    Next, let's define a couple of helpful constants.

    require 'rbconfig'
    THIS_FILE = File.expand_path(__FILE__)
    
    RUBY = File.join(Config::CONFIG['bindir'], Config::CONFIG['ruby_install_name'])
    

    The first, THIS_FILE, is simply the fully-qualified name of the file containing our demo source code. RUBY, the second constant, is set to the fully-qualified path of the running Ruby executable. These constants will come in handy with sub-process methods which require an explicit shell command to be run.

    In order to make the order of events clearer, we'll force the standard output stream into synchronised mode. This will cause it to flush its buffer after every write.

    $stdout.sync = true

    Finally, we'll be surrounding all of the code which follows in the following protective IF-statement:

    if $PROGRAM_NAME == __FILE__
    # ...
    end
    

    This will ensure that the demo code won't be re-executed when we require the source file within sub-processes.

    Method #1: The Backtick Operator

    The simplest way to execute a sub-process in Ruby is with the backtick (`). This method, which harks back to Bourne Shell scripting and Perl, is concise and often gives us exactly as much interaction as we need with a sub-process. The backtick, while it may look like a part of Ruby's core syntax, is technically an operator defined by Kernel. Like most Ruby operators it can be redefined in your own code, although that's beyond the scope of this article. Kernel defines the backtick operator as a method which executes its argument in a subshell.

    puts "1. Backtick operator"
    output = `#{RUBY} -r#{THIS_FILE} -e'hello("backticks", false)'`
    output.split("\n").each do |line|
      puts "[parent] output: #{line}"
    end
    puts
    

    Here, we use backticks to execute a child Ruby process which loads our demo source code and executes the hello method. This yields:

    1. Backtick operator
    [child] Hello, standard error
    [parent] output: [child] Hello from backticks
    [parent] output: [child] No stdin, or stdin is same as parent's
    

    The backtick operator doesn't return until the command has finished. The sub-process inherits its standard input and standard error streams from the parent process. The process' ending status is made available as a Process::Status object in the $? global (aka $CHILD_STATUS if the English library is loaded).

    We can use the %x operator as an alternate syntax for backticks, which enables us to select arbitrary delimiters for the command string. E.g. %x{echo `which cowsay`}.

    Method #2: Kernel#system

    Kernel#system is similar to the backtick operator in operation, with one important difference. Where the backtick operator returns the STDOUT of the finished command, system returns a Boolean value indicating the success or failure of the command. If the command exits with a zero status (indicating success), system will return true. Otherwise it returns false.

    puts "2. Kernel#system"
    success = system(RUBY, "-r", THIS_FILE, "-e", 'hello("system()", false)')
    puts "[parent] success: #{success}"
    puts
    

    This results in:

    2. Kernel#system
    [child] Hello from system()
    [child] No stdin, or stdin is same as parent's
    [child] Hello, standard error
    [parent] success: true
    

    Just like the backtick operator, system doesn't return until its process has exited, and leaves the process exit status in $?. The sub-process inherits the parent process' standard input, output, and error streams.

    As we can see in the example above, when system() is given multiple arguments they are assembled into a single command for execution. This feature can make system() a little more convenient than backticks for executing complex commands. For this reason and because it's more visually apparent in the code, I prefer to use Kernel#system over backticks unless I need to capture the command's output. Note that there are some other ways system() can be called; see the Kernel#exec documentation for the details.

    Method #3: Kernel#fork (aka Process.fork)

    Ruby provides access to the *NIX fork() system call via Kernel#fork. On UNIX-like OSes, fork splits the currently executing Ruby process in two. Both processes run concurrently and independently from that point on. Unlike the methods we've examined so far, fork enables us to execute in-line Ruby code in a sub-process, rather than explicitly starting a new Ruby interpreter and telling it to load our code.

    Traditionally we would need to put in some conditional code to examine the return value of fork and determine whether the code was executing in the parent or child process. Ruby makes it easy to specify what code should be run in the child by allowing us to pass a block to fork. The contents of the block will be run in the child process, after which it will exit. The parent will continue running at the point where the block ends.

    puts "3. Kernel#fork"
    pid = fork do
    hello("fork()", false)
    end
    Process.wait(pid)
    puts "[parent] pid: #{pid}"
    puts
    

    This produces the following output:

    3. Kernel#fork
    [child] Hello from fork()
    [child] No stdin, or stdin is same as parent's
    [child] Hello, standard error
    [parent] pid: 19935
    

    Note the call to Process.wait. Since the process spawned by fork runs concurrently with the parent process, we need to explicitly wait for the child process to finish if we want to synchronize with it. We use the child process ID, returned by fork, as the argument to Process.wait.

    The sub-process inherits its standard error and output streams from the parent. Since fork is a *NIX-only syscall, it will only reliably work on UNIX-style systems.

    Conclusion

    In this first installment in the Ruby Sub-processes series we've looked at three of the simplest ways to start another Ruby process from inside a Ruby program. Stay tuned for part 2, in which we'll delve into some methods for doing more complex communication with spawned sub-processes.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on June 30th, 2009 by Avdi in Ruby, Tips & Tricks and tagged , .

  • SimpleDB DataMapper Adapter: Progress Report

    From the beginning of Devver, we decided we wanted to work with some new technologies and we wanted to be able to scale easily. After looking at options AWS seemed to have many technologies that could help us build and scale a system like Devver. One of these technologies was SimpleDB. One of the other new things we decided to try was DataMapper (DM) rather than the more familiar ActiveRecord. This eventually let me to work on my own SimpleDB DataMapper adapter.

    Searching for ways to work with SDB using Ruby, we found a SimpleDB DM adapter by Jeremy Boles. It worked well initially but as our needs grew (and to make it compatible with the current version of DM) it became necessary to add and update the features of the adapter. These changes lived hidden in our project's code for awhile, for no other reason than we were too lazy to really commit it all back on GitHub. Recently though there has been a renewed interest about working with on SimpleDB with Ruby. I started pushing the code updates on GitHub, then I got a couple requests and suggestions here and there to improve the adapter. One of these suggestions cam from Ara Howard, who is doing impressive work of his own on Ruby and AWS, specifically SimpleDB. His suggestion on moving from the aws_sdb gem to right_aws, which along with other changes improved performance significantly (1.6x on write, up to 36x on reading large queries over the default limit of 100 objects). Besides performance improvements, we have recently added limit and sorting support to the adapter.

    As I added features, testing the adapter also became slow, (over a minute a run) because the functional tests actually connect to and use SimpleDB. Since Devver is all about speeding up Ruby tests, I decided to get the tests running on Devver. It was actually very easy and sped up the test suite from 1 minute and 8 seconds down to 28 seconds. You can check out how much Devver speeds up the results yourself.

    We are currently using the SimpleDB adapter to power our Devver.net website as well as the Devver backend service. It has been working well for us, but we know that it doesn't cover everyone's needs. Next time you are creating a simple project, give SimpleDB a look, we would love feedback about the DM adapter, and it would be great to get some other people contributing to the project. If anyone does fork my SDB Adapter Github repo, feel free to send me pull requests. Also, let me know if you want to try using Devver as you hack on the adapter, it can really speed up testing, and I would be happy to give out a free account.

    Lastly, at a recent Boulder Ruby users group meet up, the group did a code review for the adapter. It went well and I should finish cleaning up the code and get the improvements suggested by the group committed to GitHub soon.

    Update: The refactorings suggested at the code review are now live on GitHub.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on June 22nd, 2009 by Dan in Amazon Web Services, Development, Ruby and tagged , , , , .

  • We’re hiring!

    We're looking for an awesome Ruby developer to join our team. Get more details at http://devver.net/jobs.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on June 15th, 2009 by Ben in Uncategorized and tagged .

  • Boulder CTO Lunch with Matt McAdams

    Dan usually goes to the Boulder CTO lunches, but he was out of town this month, which meant I had the pleasure of hanging out with some of Boulder's best and brightest.

    This month's guest was Matt McAdams of TrackVia. TrackVia is an online database that is powerful yet simple enough to be used by people who are used to keeping data in spreadsheets (primarily business people). Matt gave a candid and often hilarious talk that touched on both both technical topics, and, luckily for me, a discussion of pricing and metrics, which are two topics that I'm currently very interested in.

    On technology decisions:

    Matt wasn't a database guy originally, but used his practical knowledge he gained working on a previous startup

    Went with the simplest design that could work and it's continued to scale well

    Smart technology decisions have allowed TrackVia to compete with a small, lean development team

    On product development:

    TrackVia started as a contract project for a single customer, but they saw the broader appeal

    One of the earliest databases in TrackVia is the bug database (still around).  In other words, they've been dogfooding since day one.

    They don't worry about the competition. Instead, they focus on building the features that get people to sign up and pay.

    On pricing:

    You've got try stuff and iterate. TrackVia has changed their pricing several times.

    Customers on the old pricing models have always been grandfathered in.

    Sometimes raising your price can actually gain customers because some people assume that a cheap product or service must be low-quality (even if it's actually very high quality).

    If big customers really want feature X, it's OK to ask them to pay extra to accelerate the development of that feature (or to customize their experience).

    On metrics:

    Good metrics allow you to try different strategies and measure their effect.

    You must measure, tweak, and iterate.

    If you can iterate on a weekly basis and your competition can iterate on a quarterly basis, you'll win.

    Metrics must continually be improved. TrackVia spends a lot of time tracking useful metrics, but even they know they need to add additional metrics in some key areas.


    As usual, the CTO lunch was a great place to hear from other Boulder companies and I learned a lot. Thanks for everyone who attended and special thanks to Matt for leading our discussion.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on June 4th, 2009 by Ben in Boulder and tagged , , , .

  • Spellcheck your files with Aspell and Rake

    We recently redid our website. The new site included a new design and much more content explaining what we do. We wanted a quick way to check over everything and make sure we didn't miss any spelling errors or typos. First I started looking for a web service that could scan the site for spelling errors. I found spellr.us, which is nice but would only catch errors once they were live. It also can't scan all of the pages which require being logged in.

    I was pairing with Avdi who thought we should just run Aspell, which worked out great. We were originally trying to just create a simple Emacs macro to go through all our HTML files and check them but in the end created simple Rake tasks, which makes it really easy to integrate spellcheck into CI. After Avdi figured out the commands we needed to use on each file to get the information we needed from Aspell, it was easy to just wrap the command using Rake's FileList. To keep everyone on the same setup, we created a local dictionary of words to ignore or accept and keep that checked into source control as well.

    The final solution grabs all the files you want to spell check, then runs them through Aspell with HTML filtering. We have two tasks: one that runs in interactive mode the the user can fix mistakes and one mode for CI that just fails if it finds any errors.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on May 26th, 2009 by Dan in Development, Ruby, Tips & Tricks, Tools and tagged , , , .

  • Devver.net has a new look!

    Tonight we just launched the brand new version of Devver at http://devver.net. It's not perfect (yeah, yeah, we know the blog doesn't match - that should be fixed in the next week or so), but we're trying to "release early, release often." Let us know how you like the new look and how we can improve it!

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on May 14th, 2009 by Ben in Uncategorized and tagged , .

  • Single-file Sinatra apps with specs baked-in

    It's so easy to create little single-file apps in Sinatra that it almost seems a shame to start a second file just for tests.  The other day Dan and I decided to see if we could create a Sinatra app with everything - including the tests - baked right in.  Here's what we came up with.

    The code switches modes on the name of the executable used to run the file. If we run it with the spec command, we get a test run:

    $ spec -fs sinatra-tests-baked-in.rb

    Example App
    - should serve a greeting
    - should serve content as text/plain

    Finished in 0.007221 seconds

    2 examples, 0 failures

    Otherwise, if we call it as a Ruby program, it runs the Sinatra server as we would expect:

    $ ruby sinatra-tests-baked-in.rb
    == Sinatra/0.9.1.1 has taken the stage on 4567 for development with backup from Thin
    >> Thin web server (v1.0.0 codename That's What She Said)
    >> Maximum connections set to 1024
    >> Listening on 0.0.0.0:4567, CTRL+C to stop

    And there you have it: a true single-file application, specs and all.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on May 13th, 2009 by Avdi in Development, Hacking, Ruby and tagged , , .

  • Our Tools & Practices for Remote Collaboration

    Last week, we had Avdi, the newest addition to our team, join us in Boulder, CO. It was great to get some face-to-face time, since Avdi will primarily be working from his home in Pennsylvania while Dan and I continue to work in Boulder.

    We are excited about the benefits of having a distributed team, but we're also aware that there are a number of challenges. As a result, one of the things we worked on last week was figuring out the tools and practices we'll be using to work effectively from across the country. Luckily, both Avdi and Dan have experience working remotely which we can draw upon.

    We evaluated a number of options, but settled on the following tools and practices.

    Practices

    • Daily Standup. Every day at the same time, we all get on video chat. We cover what we did yesterday, what we're working on today, and whether or not we're blocked on anything. The goal is to keep this meeting at 15 min or less.
    • Minimize interruptions. Whenever we need to communicate with each other, we try to do so on the channel that is the least disruptive (and disrupts the fewest team members). Of course, sometimes we need to be disruptive if an issue is pressing, if someone is blocked, or if we need to have high-bandwidth communication (information, especially cues like body language, don't come across very effectively on channels like email)
    • Keep it simple. We want to use the smallest number of tools and channels that will allow us to work effectively.

    Channels and Tools

    Less
    disruptive
    More
    disruptive
    Channel Tool Properties
    Passive Updates Present.ly
    • Asynchronous
    • Not required reading
    Email Any email client (in practice, Gmail)
    • Asynchronous
    • Required reading (usually)
    • Sometimes time-sensitive, sometimes not
    IM Skype
    • Semi-synchronous (but usually synchronous)
    • Usually time-sensitive
    Voice/video chat Skype
    • Synchronous
    • High bandwidth* (especially video chat)
    • Best for meetings

    * By "high bandwidth", I don't mean that the tool itself requires a lot of TCP/IP traffic (although this is true, it doesn't really matter). What I mean is that we can communicate a lot of information between team members in a short amount of time.

    Other Tools

    • Lighthouse for issue tracking
    • GitHub for source control and our project wiki
    • RealVNC for screen sharing (essential for remote pair programming)

    This is our first attempt at finding a good set of tools and practices for remote collaboration. As time goes on, we'll undoubtedly iterate and improve upon these.

    For another perspective (with a slightly different set of tools), here is a presentation from 2008 about virtual teams.

    What tools and practices have worked (and which have not worked) for your team?

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on April 28th, 2009 by Ben in Development, Devver, Tips & Tricks, Tools.

  • The newest member of the Devver team

    Dan and I are very excited to welcome Avdi Grimm to our small team here at Devver. Avdi has over ten years of professional experience working at Raytheon and MDLogix and has created and contributed to a number of open-source projects. He shares our vision of bringing developer tools to the cloud, our goal of using testing to consistently deliver high-quality software, and our commitment to openness between our company, our users, and the community at large. Welcome, Avdi!

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on April 15th, 2009 by Ben in Devver.

  • Lessons learned from our hiring process

    We're very happy to announce that we've had a great developer accept our offer to join our team. We'll have more details on our newest team member soon.

    This was our first attempt at hiring and we certainly learned a lot. While our process has improved since we started, it's still a work in progress.

    The first lesson we learned is that hiring can take a long, long time - in fact, much longer than we expected. I looked back through my emails and we officially started looking for someone more than four months ago. One thing we have heard time and time again was the importance of hiring only the best people - and, just as importantly, people that fit well within your company culture.

    As a result, we were extremely picky when it came to candidates. As the process goes on and on, it's easy to get frustrated and want to lower your bar. Avoid this temptation! If at all possible, go the other route: raise your compensation, improve your pitch, and raise your profile so you can attract candidates that meet or exceed your bar.

    When we started, we severely underestimated how long it would take and we overestimated how many applications we'd get. Because of these bad assumptions, we made our first mistake - we chose to slowly "roll out" the announcement that we were hiring. First, we just told friends and mentors. A week or so later, we tweeted and put up a blog post. Later on, we posted on some news sites and newsgroups. And after that, we finally created a job posting at Startuply. Waiting between each step was a waste of time - we should have just broadcast as loudly as we could the first day.

    We ended up getting a good number of applications, the vast majority of which were from great programmers. We assumed that going through them would waste a bunch of time. We were wrong. In reality, it was really easy to turn down many applications just from their resume (usually the applicant was good, but didn't have the skill set we were looking for). Phone calls were a bit more time-consuming, but were never a waste of time. We always enjoyed talking to candidates, always found a way to improve our process, and usually learned something new about the Ruby ecosystem.

    One thing that we improved on was learning to say "no" quickly. Even with the manageable number of candidates we talked to, it was important say "no" as soon as we discovered something that wouldn't fit. Early on, we were a bit hesitant to do so and ended up continuing our process for too long. That wasn't fair to either ourselves or the candidates. It became easier to shut things down quickly once we had interviewed a few people and started to gain confidence that new applications would show up, even if the queue was currently empty.

    We also learned that it's important to have a fairly short process. Of course, the biggest priority is to have a process that lets you learn enough about a candidate to be confident in your decision. That said, as you improve, you'll find that you can shorten your process while maintaining your confidence. Shortening the process is better for you (less time spent) and better for your candidates.

    The initial version of our process went something like this:

    1. Check resume, blog, Github account
    2. Introductory phone call to learn about candidate and convince them Devver is cool
    3. Second phone call to ask some high-level technical questions
    4. Ask for and call references
    5. Remotely pair on a project for a few hours with the candidate
    6. Fly candidate out to Boulder to meet and discuss technical and business issues

    The final version was a bit more compressed:

    1. Check resume, blog, Github account
    2. Phone call to learn about them, introduce ourselves, and cover high-level technical questions
    3. Ask for and call references
    4. (In parallel with #3) Ask candidate to write small Ruby application and review their code
    5. Fly candidate out to do some pair programming and discuss technical and business issues

    All in all, it was a good (if sometimes painful) learning experience. We deeply appreciate the time and effort spent by every single applicant (and their patience with us as we learned by trial and error).

    If you've got your own lessons you've learned while hiring, please let us know. We know we've still got a lot to learn...

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on March 27th, 2009 by Ben in Boulder, Devver.