When we first started 43 Things we placed all our images on every web server. That quickly became annoying to back up (using Amanda), so I moved it all to one machine that the other servers all mounted via NFS. Eventually we ran out of disk and needed something I wouldn’t need to touch for a long time, so I chose MogileFS.
Since there wasn’t a MogileFS library for Ruby I wrote mogilefs-client. Now I needed a way to get the images back out.
FastCGI wasn’t fast enough, copying the image into the process before sending was a big performance hit. Using mod_ruby created httpd processes that were much too large. The best way to go was a web server using sendfile() that I could easily map URIs to MogileFS keys.
Ordinarily Perlbal is used with MogileFS, but Perlbal disagrees with FreeBSD. Rather than trying to fix perl modules I decided to use WEBrick instead. WEBrick already had all the important features of Perlbal with the exception of sendfile() and I had a minty-fresh MogileFS library for ruby.
First, I wrote socket_sendfile (but later learned about ruby-sendfile which supports more platforms) and integrated it with WEBrick. Then I wrote a WEBrick servlet to map URIs into MogileFS keys that would then turn around and send the file to the client using sendfile().
With this setup I managed to get about half as fast as Apache serving raw files but the load was too high for a single process WEBrick server. To distribute the load I reworked the default WEBrick server to fork multiple processes all listening on the same server socket. We ended up with each of our four servers running eight WEBrick processes.
Adding support for If-Modified-Since also improved image serving speeds, but wasn’t quite enough, so I threw in a dirty trick. Instead of going to MogileFS to verify the image on an If-Modified-Since request I just return a 304 immediately. I can trust the web browser to do the right thing and save myself a trip to MogileFS and a stat() since our images won’t disappear unless their links also disappear.
The last two tricks I used to speed-up WEBrick was disabling access logging and giving WEBrick a dedicated IP via the load balancer. Removing the extra work of logging hits resulted in a significant speedup, around twenty percent. Running requests directly to WEBrick was another twenty percent speedup since we weren’t running through Apache’s mod_proxy.
I’ve packaged up all my WEBrick speed-ups into the webrick-high-performance gem. Unfortunately the sendfile() code is still FreeBSD-specific. (I don’t have a Linux machine so I can’t test a socket_sendfile written for Linux.)
In order to get some speed back I’m using a sendfile(2) extension written using RubyInline. Due to the double-dispatch images end up being noticable slower, but at least I won’t have to restart whole webservers every couple hours to get images back on track.