• A dozen (or so) ways to start sub-processes in Ruby: Part 1

    Introduction

    It is often useful in Ruby to start a sub-process to run a particular chunk of Ruby code. Perhaps you are trying to run two processes in parallel, and Ruby's green threading doesn't provide sufficient concurrency. Perhaps you are automating a set of scripts. Or perhaps you are trying to isolate some untrusted code while still getting information back from it.

    Whatever the reason, Ruby provides a wealth of facilities for interacting with sub-processes, some better known than others. In this series of articles I will be focusing on running Ruby as a sub-process of Ruby, although many of the techniques I'll be demonstrating are applicable to running any type of program in a sub-process. I'll also be keeping the focus on UNIX-style platforms, such as Linux and Mac OS X. Sub-process handling on Windows differs significantly, and we'll leave that for another series.

    In the first and second articles, I'll demonstrate some of the facilities for starting sub-processes that Ruby possesses out-of-the-box, no requires needed. In the third article we'll look at some tools provided in Ruby's Standard Library which build on the methods introduced in part one. And in the fourth instalment I'll briefly survey a few of the many Rubygems which simplify sub-process interactions.

    Getting Started

    To begin, let's define a few helper methods and constants which we'll refer back to throughout the series. First, let's define a simple method which will serve as our "slave" code - the code we want to execute in a sub-process. Here it is:

    def hello(source, expect_input)
      puts "[child] Hello from #{source}"
      if expect_input
        puts "[child] Standard input contains: \"#{$stdin.readline.chomp}\""
      else
        puts "[child] No stdin, or stdin is same as parent's"
      end
      $stderr.puts "[child] Hello, standard error"
    end
    


    (Note: The full source code for this article can be found at http://gist.github.com/137705)

    This method prints a message to the standard output stream, a message to the standard error stream, and optionally reads and prints a message from the standard input stream. One of the things we'll be exploring in this series is the differing ways in which the various sub-process-starting methods handle standard I/O streams.

    Next, let's define a couple of helpful constants.

    require 'rbconfig'
    THIS_FILE = File.expand_path(__FILE__)
    
    RUBY = File.join(Config::CONFIG['bindir'], Config::CONFIG['ruby_install_name'])
    

    The first, THIS_FILE, is simply the fully-qualified name of the file containing our demo source code. RUBY, the second constant, is set to the fully-qualified path of the running Ruby executable. These constants will come in handy with sub-process methods which require an explicit shell command to be run.

    In order to make the order of events clearer, we'll force the standard output stream into synchronised mode. This will cause it to flush its buffer after every write.

    $stdout.sync = true

    Finally, we'll be surrounding all of the code which follows in the following protective IF-statement:

    if $PROGRAM_NAME == __FILE__
    # ...
    end
    

    This will ensure that the demo code won't be re-executed when we require the source file within sub-processes.

    Method #1: The Backtick Operator

    The simplest way to execute a sub-process in Ruby is with the backtick (`). This method, which harks back to Bourne Shell scripting and Perl, is concise and often gives us exactly as much interaction as we need with a sub-process. The backtick, while it may look like a part of Ruby's core syntax, is technically an operator defined by Kernel. Like most Ruby operators it can be redefined in your own code, although that's beyond the scope of this article. Kernel defines the backtick operator as a method which executes its argument in a subshell.

    puts "1. Backtick operator"
    output = `#{RUBY} -r#{THIS_FILE} -e'hello("backticks", false)'`
    output.split("\n").each do |line|
      puts "[parent] output: #{line}"
    end
    puts
    

    Here, we use backticks to execute a child Ruby process which loads our demo source code and executes the hello method. This yields:

    1. Backtick operator
    [child] Hello, standard error
    [parent] output: [child] Hello from backticks
    [parent] output: [child] No stdin, or stdin is same as parent's
    

    The backtick operator doesn't return until the command has finished. The sub-process inherits its standard input and standard error streams from the parent process. The process' ending status is made available as a Process::Status object in the $? global (aka $CHILD_STATUS if the English library is loaded).

    We can use the %x operator as an alternate syntax for backticks, which enables us to select arbitrary delimiters for the command string. E.g. %x{echo `which cowsay`}.

    Method #2: Kernel#system

    Kernel#system is similar to the backtick operator in operation, with one important difference. Where the backtick operator returns the STDOUT of the finished command, system returns a Boolean value indicating the success or failure of the command. If the command exits with a zero status (indicating success), system will return true. Otherwise it returns false.

    puts "2. Kernel#system"
    success = system(RUBY, "-r", THIS_FILE, "-e", 'hello("system()", false)')
    puts "[parent] success: #{success}"
    puts
    

    This results in:

    2. Kernel#system
    [child] Hello from system()
    [child] No stdin, or stdin is same as parent's
    [child] Hello, standard error
    [parent] success: true
    

    Just like the backtick operator, system doesn't return until its process has exited, and leaves the process exit status in $?. The sub-process inherits the parent process' standard input, output, and error streams.

    As we can see in the example above, when system() is given multiple arguments they are assembled into a single command for execution. This feature can make system() a little more convenient than backticks for executing complex commands. For this reason and because it's more visually apparent in the code, I prefer to use Kernel#system over backticks unless I need to capture the command's output. Note that there are some other ways system() can be called; see the Kernel#exec documentation for the details.

    Method #3: Kernel#fork (aka Process.fork)

    Ruby provides access to the *NIX fork() system call via Kernel#fork. On UNIX-like OSes, fork splits the currently executing Ruby process in two. Both processes run concurrently and independently from that point on. Unlike the methods we've examined so far, fork enables us to execute in-line Ruby code in a sub-process, rather than explicitly starting a new Ruby interpreter and telling it to load our code.

    Traditionally we would need to put in some conditional code to examine the return value of fork and determine whether the code was executing in the parent or child process. Ruby makes it easy to specify what code should be run in the child by allowing us to pass a block to fork. The contents of the block will be run in the child process, after which it will exit. The parent will continue running at the point where the block ends.

    puts "3. Kernel#fork"
    pid = fork do
    hello("fork()", false)
    end
    Process.wait(pid)
    puts "[parent] pid: #{pid}"
    puts
    

    This produces the following output:

    3. Kernel#fork
    [child] Hello from fork()
    [child] No stdin, or stdin is same as parent's
    [child] Hello, standard error
    [parent] pid: 19935
    

    Note the call to Process.wait. Since the process spawned by fork runs concurrently with the parent process, we need to explicitly wait for the child process to finish if we want to synchronize with it. We use the child process ID, returned by fork, as the argument to Process.wait.

    The sub-process inherits its standard error and output streams from the parent. Since fork is a *NIX-only syscall, it will only reliably work on UNIX-style systems.

    Conclusion

    In this first installment in the Ruby Sub-processes series we've looked at three of the simplest ways to start another Ruby process from inside a Ruby program. Stay tuned for part 2, in which we'll delve into some methods for doing more complex communication with spawned sub-processes.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on June 30th, 2009 by Avdi in Ruby, Tips & Tricks and tagged , .

  • Spellcheck your files with Aspell and Rake

    We recently redid our website. The new site included a new design and much more content explaining what we do. We wanted a quick way to check over everything and make sure we didn't miss any spelling errors or typos. First I started looking for a web service that could scan the site for spelling errors. I found spellr.us, which is nice but would only catch errors once they were live. It also can't scan all of the pages which require being logged in.

    I was pairing with Avdi who thought we should just run Aspell, which worked out great. We were originally trying to just create a simple Emacs macro to go through all our HTML files and check them but in the end created simple Rake tasks, which makes it really easy to integrate spellcheck into CI. After Avdi figured out the commands we needed to use on each file to get the information we needed from Aspell, it was easy to just wrap the command using Rake's FileList. To keep everyone on the same setup, we created a local dictionary of words to ignore or accept and keep that checked into source control as well.

    The final solution grabs all the files you want to spell check, then runs them through Aspell with HTML filtering. We have two tasks: one that runs in interactive mode the the user can fix mistakes and one mode for CI that just fails if it finds any errors.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on May 26th, 2009 by Dan in Development, Ruby, Tips & Tricks, Tools and tagged , , , .

  • Our Tools & Practices for Remote Collaboration

    Last week, we had Avdi, the newest addition to our team, join us in Boulder, CO. It was great to get some face-to-face time, since Avdi will primarily be working from his home in Pennsylvania while Dan and I continue to work in Boulder.

    We are excited about the benefits of having a distributed team, but we're also aware that there are a number of challenges. As a result, one of the things we worked on last week was figuring out the tools and practices we'll be using to work effectively from across the country. Luckily, both Avdi and Dan have experience working remotely which we can draw upon.

    We evaluated a number of options, but settled on the following tools and practices.

    Practices

    • Daily Standup. Every day at the same time, we all get on video chat. We cover what we did yesterday, what we're working on today, and whether or not we're blocked on anything. The goal is to keep this meeting at 15 min or less.
    • Minimize interruptions. Whenever we need to communicate with each other, we try to do so on the channel that is the least disruptive (and disrupts the fewest team members). Of course, sometimes we need to be disruptive if an issue is pressing, if someone is blocked, or if we need to have high-bandwidth communication (information, especially cues like body language, don't come across very effectively on channels like email)
    • Keep it simple. We want to use the smallest number of tools and channels that will allow us to work effectively.

    Channels and Tools

    Less
    disruptive
    More
    disruptive
    Channel Tool Properties
    Passive Updates Present.ly
    • Asynchronous
    • Not required reading
    Email Any email client (in practice, Gmail)
    • Asynchronous
    • Required reading (usually)
    • Sometimes time-sensitive, sometimes not
    IM Skype
    • Semi-synchronous (but usually synchronous)
    • Usually time-sensitive
    Voice/video chat Skype
    • Synchronous
    • High bandwidth* (especially video chat)
    • Best for meetings

    * By "high bandwidth", I don't mean that the tool itself requires a lot of TCP/IP traffic (although this is true, it doesn't really matter). What I mean is that we can communicate a lot of information between team members in a short amount of time.

    Other Tools

    • Lighthouse for issue tracking
    • GitHub for source control and our project wiki
    • RealVNC for screen sharing (essential for remote pair programming)

    This is our first attempt at finding a good set of tools and practices for remote collaboration. As time goes on, we'll undoubtedly iterate and improve upon these.

    For another perspective (with a slightly different set of tools), here is a presentation from 2008 about virtual teams.

    What tools and practices have worked (and which have not worked) for your team?

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on April 28th, 2009 by Ben in Development, Devver, Tips & Tricks, Tools.

  • Managing Amazon EC2 with your iPhone

    I wanted a quick way when out and about to easily manage our AWS EC2 instances while out and about. It hasn't happened often, but occasionally I am away from the computer and I need to reboot the instances. Perhaps I remember our developer cluster isn't being used and want to shut it down to save some money.

    I didn't find anything simple and free with a quick Google search, so in a about an hour I wrote a nice little Sinatra app that will let me view our instances, shutdown, or reboot any specific instance or all of them. The tiny framework actually turned out to be even more useful as I now have options that let us tail error logs, reboot Apache, reboot mongrel clusters, or execute any common system administration task.

    I won't be going into detail on how to build a iPhone webapp using Sinatra and iUI, because Ben already created an excellent post detailing all of those steps. In fact I used his old project as the template when I created this project. I can't begin to explain how amazingly simple it is to build an iPhone webapp using Sinatra, so if you have been thinking of a quick project I highly recommend it.

    Here are some screen shots showing the final app. (screenshot courtesy of iPhoney):

    ec2 manager home view

    ec2 manager home view.

    ec2 manager describe view

    ec2 manager describe instances view.

    ec2 manager instance view.

    ec2 manager instance view.

    This app uses the Amazon EC2 API Tools to do all the heavy lifting. So this app assumes that you already have the tools installed and working on the machine you want this app to run on. This normally involves installing the tools and setting up some environment variables like EC2_HOME, so make sure you can run ec2-describe-instances from the machine. After that you should just have to change EC2_HOME in the Sinatra app to match the path where you installed the EC2 tools.

    Let me know if you have any issues, it is quick and dirty, but I have already found it useful.

    To run the app:
    cmd> ruby -rubygems ./ec2_manager.rb

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on March 5th, 2009 by Dan in Amazon Web Services, Development, Hacking, Ruby, Tips & Tricks, Tools and tagged , , , , .

  • Boulder CTO January Lunch with Jud Valeski

    The Boulder CTO Lunch meets once a month with a guest speaker and covers topics and questions that startup CTOs should find interesting. This month, the group had Jud Valeski, who is currently the CTO at Gnip. Prior to Gnip, Jud started his professional career in technology at IBM as a software developer. In the past ten years he has spent about five of them coding (primarily C/C++), and about five of them in technical team management/director capacities. The bulk of his experience comes from Netscape Communications (acquired by AOL), with varying durations at IBM, Microsoft, and onebox.com (acquired by OpenWave).

    I will share highlights of the discussion, but since this was a open, free-formed discussion, I am sure I couldn’t capture everything and not all my notes are completely accurate. Some of the notes also come from participants in the conversation and are not necessarily held by the discussion leader.

    A question we ask each visitor, every month, What is a CTO? we do this because every CTO has a different answer.
    "The title is pretty much irrelevant to me." At a small company it doesn't matter. It means something, it means a lot, at larger companies.

    CEO is pure fantasy
    Dev is pure reality
    CTO is the bridge

    Jud is a developer turned CTO, he explains this can be hard for some because a CTO brings some management responsibilities, which doesn't always come naturally to all developers.

    If you come from only management experience you can have a hard time with the CTO role because they don't have the technical depth to make these decisions.

    Important qualities of a CTO
    * Technical Management
    * Technical Direction (choose the tech to use, the approach, the design)

    CTO role embodies the sense of leadership.

    As CTO, you need a complete understanding of the business.

    If the CTO is scattered that can be fatal. Be clear with prioritization and direction. If a person in the leadership position is confused it leads to a lack of confidence.

    CTOs need to always be able to draw the system architecture. If you can't, you are too detached.

    To show those around you that you know what you are doing, you have to get down in the trenches and make some low level decisions with them. Backing up the decisions by explaining the technical merit of those choices.

    Defining his role at Gnip
    Push technical discussions along. Don't let a topic be discussed too long when a decision needs to be made. Also, don't cut off the discussion before exploring enough opportunities or you can make the wrong decisions.

    At Gnip, defining the technical direction involved setting expectations that Gnip is an agile shop, uses pair programming, and uses lots of testing

    One thing Jud measures the most is how well the team is going. If it isn't going well, it is his fault.

    A personal metric of Jud's: if Gnip disbands tomorrow, every employee should be better and more knowledgeable than they were before they worked at Gnip.

    Hiring
    * Only hire people that are smarter than you.
    * If you think you are the smartest in the room, there is an ego problem and it is hurting the business.
    * The downside to this is it takes a long time to hire.
    * Gnip might hire 1 out of every 120 resumes it takes the time to seriously consider.
    * It is important to sit down and write code with someone, plenty of people can talk a good talk
    * Hire generalist because they can learn and do anything
    * Don't just hire for the tech need, the team personality and fit are just as important.
    * Everyone on the team has to be great people, not just in the office, but out of the office.

    The Gnip hiring process:
    Resume Screen -> 30 minute call -> group tech conversation -> half/full day pair programming

    Jud was once a part of a team that went from 15-50 in 4 months, which was too fast of growth and it was a mistake.

    As a CTO do you have to manage your CEO?
    "Yeah, all the time, absolutely."

    Like any job you need to be careful about what you escalate to the boss. The CEO doesn't want to hear about every little thing.

    What can a CTO do to foster engineering growth?
    * Let employees buy as many books as they want on the companies dime.
    * If you can afford it do the same for training
    * Encourage participation in users groups
    * Churn what people are working on
    * Force developers out of their comfort zones.

    What do you do about slipping schedules?
    "Anyone suggesting software stays on schedule is in fantasy land"

    If you pin motivation, success, or failure to a schedule you are doomed.

    Wins have to come from someplace, if you set a schedule and meet it celebrate.

    Reconciling schedules with business realities is probably one of the toughest CTO jobs.

    If you have shrink wrapped software schedules has a whole different meaning.

    Don't inflate dates just to make the schedule, any dishonesty anywhere in the chain is bad.

    You need to be able to release internally all the time. Internally always try to release early and release often.

    When falling behind schedule, you have to evaluate the severity of each bug and feature. If there is a known workaround, that changes everything. If there is a workaround then it doesn't have to block a release. If there isn't a workaround, then you have to decide if it makes sense to have a release missing that feature.

    Thanks a bunch to Jud for sharing his thoughts with our CTO group. I also want to thank Tom Chikoore from Filtrbox for organizing the meetings, and Jon Fox and David Cohen who originally started our CTO lunches.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on February 16th, 2009 by Dan in Boulder, Misc, TechStars, Tips & Tricks.

  • Ruby people on Twitter

    The Ruby community is always quickly moving, changing, and adopting new things. It is good to keep your ear to the ground so you can learn and adopt things that the community is finding really useful. There are a number of ways to do this, like watching the most popular Ruby projects on GitHub, most active projects on RubyForge, Ruby Reddit, or listening to the Rails podcast. The way I have found most effective is following a good collection of the Ruby community on Twitter, many of the most active Ruby community members and companies are on Twitter. It is where I have first heard of many things going on in Ruby like the recent Merb/Rails merge.

    You can find a great list of 50+ (now 100+) Rubyists to follow on Twitter from RubyLearning. I thought we might as well give out a list of some of the Ruby people Devver.net is following on twitter.

    technoweenie
    jamis / Jamis Buck
    obie / Obie Fernandez
    chadfowler / Chad Fowler
    engineyard / Engine Yard
    d2h / DHH
    rjs / Ryan Singer
    jasonfried / Jason Fried
    37signals
    foodzie
    fiveruns
    _why / why the lucky stiff
    gilesgoatboy / Giles Bowkett
    dlsspy / Dustin Sallings
    julien51 / julien
    rbates / Ryan Bates
    defunkt / Chris Wanstrath
    chrismatthieu / Chris Matthieu
    littleidea / Andrew Clay Shafer
    headius / Charles Nutter
    bascule / Tony Arcieri
    atmos / Corey Donohoe
    ubermajestix / Tyler Montgomery
    raganwald / Reg Braithwaite
    chriseppstein


    of course I have to give a special shout to ourselves:

    danmayer / Dan Mayer
    bbrinck / Ben Brinckerhoff
    devver.net

    If we should be following you also send us an email at contact@devver.net, and we can hook up on Twitter as well.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on January 8th, 2009 by Dan in Misc, Ruby, Tips & Tricks.

  • Boulder CTO December Lunch with Tim Wolters

    The Boulder CTO Lunch meets once a month with a guest speaker and covers topics and questions that startup CTOs should find interesting. This month, the group had Tim Wolters from Collective Intellect come lead the discussion. Tim is a serial entrepreneur currently working on using artificial intelligence and semantic analysis to extract knowledge from unstructured text found in social media. Collective Intellect's customers use this analysis to inform and measure the effectiveness of their PR and marketing strategy.

    Tim is considering working on a book, a startup survival guide for CTOs. Some of his ideas for the book helped lead our discussion during our meeting. I will try to present my notes under topic headings that Tim mentioned, but since this was a open free formed discussion, I am sure I couldn't capture everything and not all my notes are completely accurate.

    The Idea
    People should keep a journal of ideas. Tim keeps a journal which he updates, tags, and adds ideas. On any idea, keep track of what is near term, what resources are needed, what is the cost, and what does the related market look like. (I highly recommend this! Ben and I keep a wiki, which has grown to be an incredibly useful resource and was the initial starting point for our last two companies)

    Ideas should have an "Aha!" factor that makes you wonder why someone else isn't already doing it (or some emotional appeal that makes lives better).

    During the first few years of a startup you can't work on all the ideas that come to mind, that is why it is best to keep a journal, just add little notes to the idea to keep them in the back burner.

    Talk to others about ideas and perhaps have a group move on an idea and lay the groundwork while leading as an adviser.

    Don't be worried about people taking ideas. After starting a few companies you know how hard it is to really bring something to market.

    What about brainstorming for ideas with a group?
    Brainstorming groups have never worked for Tim, it just hasn't worked out. If you have the right people around the table (people that can make things happen), it could work, but Tim hasn't seen it.

    Ideas depend a lot on timing in the marketplace. If the market is moving slow you can slowly look at an idea. If the market it really moving fast you need to spin it up quick and get a lot of people working on it to really make a move on the idea.

    Look over your ideas once in awhile and see what still really interest you.

    The Role
    As a CTO, you paint a landscape of the product and market.

    There are two kinds of CTOs: tactical and visionary. Tactical CTOs are internally focused, manages the team, makes the day-to-day tactics so the product gets out there. The visionary CTO sees where the product could go in the market place, signs early deals and customers, looks for features that lead towards or away from markets/competitors/partnerships. The visionary isn't working on architecture but the market landscape, what partners will benefit the product or get it out sooner.

    CTO should be thinking about things such as the three hardest problems that the company faces, so they know what will also be affecting their competitors.

    People who liked architectural purity but learned it isn't as import at winning at the business end up making great CTOs.

    CTOs need to stay involved with customers to make decisions about the project innovation and development. Stay active on sales calls, talk with sales people, read all the RFPs.

    Becoming the CTO vs VP of engineering?
    Are you good at managing or not? VP of engineering is a managing role. If not, divide off the management as soon as possible (in his case that wasn't possible until the company was about 20 people).

    Good sales people leverage a CTO as a company evangelist. If you are a CTO you have to be comfortable with presenting and publicity. You will be at conferences, sales calls, giving presentations, and fund raising. If you aren't comfortable with these things, get comfortable with it.

    time spent:

    • 10% guiding research
    • 30% Sales
    • 30% Partnerships
    • 30% Biz Dev Dealings

    Reputation
    After some startups, successes, and expanding your network things like getting a team, funding, and getting a startup off the ground are much easier the next time.

    It will take 3 to 5 times longer than you think to get a project going if you are an unknown entrepreneur with no reputation.

    Don't solve the big unsolvable problems first, the first time start with smaller problems and develop a reputation while solving them. Angels and VCs aren't funding research efforts, don't just chase after big impossible goals.

    After a company is bought, it makes sense to make the purchaser successful. It builds on your reputation.

    Become a big fish in a small pond and then move to a bigger pond.

    Putting together the team
    The ideal size for an engineering team is 6-8 people, bigger teams have difficulties maintaining the right amount of communication.

    For hiring, Tim personally sits down with the key hires, and if it is research he does interviews with applicants as well.

    The Traps and Pitfalls of Startup Companies
    3 things that companies get stuck on that can kill the company.

    • Problem with getting over enamored with their original idea, startups must be able to adapt
    • Getting enamored with the research technology, for technology's sake
    • Getting emotionally tied to architecturally purity. Working on layers of abstraction on abstraction to avoid some possible future problem.

    Other things that kill companies (which are kind of like a marriage)

    • Not the right chemistry
    • Bad culture or losing company culture
    • Employees need some sense of allegiance. If they don't have it cut them immediately
    • Lacks a culture of adaptability
    • Not thinking about how to quickly get to the market and solve problems

    Continual code death march. Sometimes companies go on code marches to get something to the marketplace. This can't be done many nights or it will start taking a toll on other aspects of your life. Strive for balance.

    During a startup, you continually are hitting false summits, you think that if you could just get that contact, solve that roadblock, pass this milestone, or make this key hire then everything will fall into place. While these are important as milestones and you should celebrate them you are not done. Or rather, it typically doesn't get any easier. What it does is takes more risk out allowing you to go solve bigger/other problems.

    When founders or others in a company argue, which they need to do sometimes, don't do it in front of everyone. Discuss disputes offline, reach agreement and present a unified front to the company.


    Thanks so much to Tim for sharing some of his thoughts with our group. I will leave you with a final question and quote. Someone once asked why Tim likes to start companies?
    "I like to pick where I work and who I work with."

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on December 11th, 2008 by Dan in Boulder, First Steps, Funding, Misc, Tips & Tricks.

  • Installing and running git-svn on Mac OSX 10.4 Tiger

    I am shocked at how much time it took me to get git-svn working on my mac. I use MacPorts, which works well most of the time. Sometimes it has problems which makes me really wish for apt-get on OS X. apt-get normally has worked much nicer for me, but can have its issues too. I even occasionally wish for Windows and a simple install.exe which works 95% of the time out of the box. Really I wish Apple would throw some engineer support to MacPorts and make the service rock solid.

    I have had git installed and working for awhile, but preparing to switch our main project from Subversion (svn) to git, I thought I should start using git-svn. It seemed smart to use git-svn for awhile to get used to git, before a full switch so I could fall back on svn in a crunch. I decided to start using git-svn, but the first run of the git-svn command caused this error, and I had no idea how much of my night was about to be wasted...

    Can't locate SVN/Core.pm in @INC

    Searching led to a couple of webpages, but the most useful was getting git to work on OS X Tiger. It had a quick fix that might work or the long route fix. For some lucky people it is just a path problem. I checked if that was the case for me, by the following command

    PATH=/opt/local/bin:$PATH; git svn

    unfortunately for me I got the same error, OK I need to reinstall SVN with additional bindings...

    > sudo port uninstall -f subversion-perlbindings
    > sudo port install -f subversion-perlbindings

    leading to this error:

    ---> Building serf with target all
    Error: Target org.macports.build returned: shell command " cd "/opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_www_serf/work/serf-0.2.0" && make all " returned error 2
    Command output: /opt/local/share/apr-1/build/libtool --silent --mode=compile /usr/bin/gcc-4.0 -O2 -I/opt/local/include -DDARWIN -DSIGPROCMASK_SETS_THREAD_MASK -no-cpp-precomp -I. -I/opt/local/include/apr-1 -I/opt/local/include/apr-1 -c -o buckets/aggregate_buckets.lo buckets/aggregate_buckets.c && touch buckets/aggregate_buckets.lo
    libtool: compile: unable to infer tagged configuration
    libtool: compile: specify a tag with `--tag'
    make: *** [buckets/aggregate_buckets.lo] Error 1

    I spent some time searching and eventually I find the solution to the serf error. I couldn't read the blog because it wasn't in English, but I could read enough to solve my MacPorts serf install problem. I followed these few lines from the blog

    cd /opt/local/var/macports/build/_opt_local_var_macports_sources_rsync.macports.org_release_ports_www_serf/work/serf-0.2.0
    $ sudo ./configure --prefix=/opt/local --with-apr=/opt/local --with-apr-util=/opt/local
    $ sudo make all
    $ sudo port install serf

    Awesome, I have serf. Now what is next? Back to building svn with perl bindings, that works. Now, let's build git again since svn with perl bindings is finally installed.

    sudo port install git-core +svn

    Which fails because of p5-svn-simple

    dyld: lazy symbol binding failed: Symbol not found: _Perl_Gthr_key_ptr
    Referenced from: /usr/local/lib/libsvn_swig_perl-1.0.dylib
    Expected in: flat namespace

    dyld: Symbol not found: _Perl_Gthr_key_ptr
    Referenced from: /usr/local/lib/libsvn_swig_perl-1.0.dylib
    Expected in: flat namespace

    Error: Status 1 encountered during processing.

    OK, I need to get p5-svn-simple working. Searching leads to this thread MacPort errors related to git. Here you will find the amazingly useful comment by Orestis:

    "As mentioned move your libsvn_swig_perl* out of /usr/local/lib AND out of /usr/lib into temporary folders.

    Uninstall and reinstall subversion-perlbindings

    Install p5-svn-simple (and git-core +svn which is what lead me here)

    Move the libsvn_swig_perl files back in /usr/lib and /usr/local/lib (or else git svn won't work). 

    > cd /usr/local
    > mv ./lib/libsvn_swig_perl* ./bak/
    > sudo port install p5-svn-simple

    Sweet that works now

    > sudo port install git-core +svn
    > cd /usr/local
    > mv ./bak/libsvn_swig_perl* ./lib/

    Finally I try to run git-svn, only to see the same ERROR I had from the very beginning! I am about to lose it but decide that I should try the quick fix again to see if it is the path issue...

    PATH=/opt/local/bin:$PATH; git svn

    It works! Alright now it is just a path problem. So I open up my .bash_profile, and notice I already have that path included

    # Setting the path for MacPorts.
    export PATH=/opt/local/bin:/opt/local/sbin:/Applications/MzScheme\ v352/bin:$PATH

    But I also have an additional path added from when I originally built git from source, and it looks like I was running my old broken version of git-svn. So I just had to remove this one line from my .bash_profile

    export PATH=~/projects/git-1.5.6.1:$PATH

    and hours later and with a ton of frustration I have a fully functioning git-svn.

    Now that it is working, you can move on to learning git-svn in 5 minutes.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on December 9th, 2008 by Dan in Development, Hacking, Misc, Tips & Tricks, Tools.

  • Ruby Beanstalkd distributed worker basics

    At Devver we have a lot of jobs to do quickly, so we distribute our work out to a group of EC2 workers. We have tried and used a number of queuing solutions with Ruby, but in the end beanstalkd seemed to be the best solution for us at the time.

    I have only seen a few posts about the basics of using beanstalkd with Ruby. I decided to make two posts evolving a simple Ruby beanstalkd example into a more complicated example. This way people new to beanstalkd could see how easy it can be to get up and running with distributed processing using Ruby and beanstalkd. Then people that are doing more advanced work with beanstalkd could see some examples of how we are working with it here at Devver. It would also be great for more experienced beanstalkd warriors to share their thoughts as there aren't many examples out in the wild. The lack of examples makes it harder to learn and difficult to decide what the best practices are when working with beanstalkd queues.

    I have also shared two scripts we have found useful while working with beanstalkd. beanstalk_monitor.rb, which lets you see all the queue statistics about current usage, or to monitor the information of a single queue you are interested in. Finally, beanstalk_killer.rb, which is useful if you want to work on how your code will react to beanstalkd getting backed up or stalling (in beanstalkd speak, "Putting on the brakes"). It was a little harder to pull everything out and make a simple example from our code than I thought, and obviously the example is a bit useless. It should still give a solid example of how to do the basics of distributing jobs with beanstalkd.

    For those new to beanstalk, there are a few things you will need to know like how to get a queue object, how to put objects on the queue, how to take objects off the queue, and how to control which queue you are working with. For a higher level overview or more detailed information, I recommend checking out the beanstalkd FAQ. The full example code is below, but first taking a look at the basic snippets might help.

    #to work with beanstalk you need to get a client connection
    queue = Beanstalk::Pool.new(["#{SERVER_IP}:#{DEFAULT_PORT}"])
    #by default you will be working on the 'default' tube or queue
    #if we wanted to work on a different queue we could change tubes, like so
    queue.watch('test_queue')
    queue.use('test_queue')
    queue.ignore('default')
    #to put a simple string on a queue
    queue.put('hello queue world')
    #to receive a simple string
    job = queue.reserve
    puts job.body #prints 'hello queue world'
    #if you don't delete the job when you're done, the queue assumes there is an error
    #and the job will show back up on the queue again
    job.delete
    

    How to run this example (on OS X, with macports installed)

    > sudo port install beanstalkd
    > sudo gem install beanstalk-client
    > beanstalkd
    > ruby beanstalk_tester.rb

    Download: beanstalk_tester.rb

    require 'beanstalk-client.rb'
    
    DEFAULT_PORT = 11300
    SERVER_IP = '127.0.0.1'
    #beanstalk will order the queues based on priority, with the same priority
    #it acts FIFO, in a later example we will use the priority
    #(higher numbers are higher priority)
    DEFAULT_PRIORITY = 65536
    #TTR is time for the job to reappear on the queue.
    #Assuming a worker died before completing work and never called job.delete
    #the same job would return back on the queue (in seconds)
    TTR = 3
    
    class BeanBase
    
      #To work with multiple queues you must tell beanstalk which queues
      #you plan on writing to (use), and which queues you will reserve jobs from
      #(watch). In this case we also want to ignore the default queue
      def get_queue(queue_name)
        queue = Beanstalk::Pool.new(["#{SERVER_IP}:#{DEFAULT_PORT}"])
        queue.watch(queue_name)
        queue.use(queue_name)
        queue.ignore('default')
        queue
      end
    
    end
    
    class BeanDistributor < BeanBase
    
      def initialize(amount)
        @messages = amount
      end
    
      def start_distributor
        #put all the work on the request queue
        bean_queue = get_queue('requests')
        @messages.times do |num|
          msg = BeanRequest.new(1,num)
          #Take our ruby object and convert it to yml and put it on the queue
          bean_queue.yput(msg,pri=DEFAULT_PRIORITY, delay=0, ttr=TTR)
        end
    
        puts "distributor now getting results"
        #get all the results from the results queue
        bean_queue = get_queue('results')
        @messages.times do |num|
          result = take_msg(bean_queue)
          puts "result: #{result}"
        end
    
      end
    
      #this will take a message off the queue, process it and return the result
      def take_msg(queue)
        msg = queue.reserve
        #by calling ybody we get the content of the message and convert it from yml
        count = msg.ybody.count
        msg.delete
        return count
      end
    
    end
    
    class BeanWorker < BeanBase
    
      def initialize(amount)
        @messages = amount
        @received_msgs = 0
      end
    
      def start_worker
        results = []
        #get and process all the requests, on the requests queue
        bean_queue = get_queue('requests')
        @messages.times do |num|
          result = take_msg(bean_queue)
          results << result
          @received_msgs += 1
        end
    
        #return all of the results, by placing them on the separate results queue
        bean_queue = get_queue('results')
        results.each do |result|
          msg = BeanResult.new(1,result)
          bean_queue.yput(msg,pri=DEFAULT_PRIORITY, delay=0, ttr=TTR)
        end
    
        #this is just to pass information out of the forked process
        #we return the number of messages we received as our exit status
        exit @received_msgs
      end
    
      #this will take a message off the queue, process it and return the result
      def take_msg(queue)
        msg = queue.reserve
        #by calling ybody we get the content of the message and convert it from yml
        count = msg.ybody.count
        result = count*count
        msg.delete
        return result
      end
    
    end
    
    ############
    # These are just simple message classes that we pass using beanstalks
    # to yml and from yml functions.
    ############
    class BeanRequest
      attr_accessor :project_id, :count
      def initialize(project_id, count=0)
        @project_id = project_id
        @count = count
      end
    end
    
    class BeanResult
      attr_accessor :project_id, :count
      def initialize(project_id, count=0)
        @project_id = project_id
        @count = count
      end
    end
    
    #write X messages on the queue
    numb = 10
    
    recv_count = 0
    
    # Most of the time you will have two entirely seperate classes
    # but to make it easy to run this example we will just fork and start our server
    # and client seperately. We will wait for them to complete and check
    # if we received all the messages we expected.
    puts "starting distributor"
    server_pid = fork {
      BeanDistributor.new(numb).start_distributor
    }
    
    puts "starting client"
    client_pid = fork {
      BeanWorker.new(numb).start_worker
    }
    
    Process.wait(client_pid)
    recv_count = $?.exitstatus
    puts "client finished received #{recv_count} msgs"
    if(numb==recv_count)
      puts "received the expected number of messages"
    else
      puts "error didn't receive the correct number of messages"
    end
    
    Process.wait(server_pid)
    

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on October 28th, 2008 by Dan in Development, Devver, Hacking, Tips & Tricks.

  • Tracking down open files with lsof

    The other day I was running in a weird error on Devver. After running around twenty test runs on the system, the component that actually runs individual unit tests was crashing due to "Too many open files - (Errno::EMFILE)"

    Unfortunately, I didn't know much more than that. Which files were being kept open? I knew that this component loaded quite a few files, and that by default, OS X only allows 256 open file descriptors (ulimit -n will tell you the default on your system). If this was a valid case of needing to load more files, I could just up the limit using ulimit -n <bigger_number>.

    Fortunately, a quick Google or two pointed the way to lsof. Unfortunately, my Unix-fu is never nearly as good as I wish and I didn't know much about this handy utility. But I quickly discovered that it's very useful for tracking down problems like this. I quickly used ps to find the PID of the Devver process and then a quick lsof -p <PID> displayed all the files that the process had open. So easy!

    Sure enough, there were a ton of redundant file handles to the file that we use to store information about the Devver run. Armed with this information, it was easy to find the buggy code where we called File.open but failed to ever close the file.

    Unfortunately, I still don't know how to write a good unit test for this case. I guess I could do something ugly like call sytem("lsof -p pid | wc -l") before and after calling the code and make sure the number of descriptors stays constant, but that's really ugly. Is there a way to test this within Ruby? I'm open to ideas.

    Still, it's always good to learn more about a powerful Unix tool. I'm constanly amazed by the power and depth of the Unit tool set.

    Got a slow Test::Unit or RSpec suite?
    Devver can run it up to three times faster! Request a beta invite today.

    Posted on October 9th, 2008 by Ben in Development, Hacking, Testing, Tips & Tricks.