Image Serving with WEBrick

Eric Hodel | Tue, 28 Mar 2006 00:44:07 GMT

Posted in , ,

When we first started 43 Things we placed all our images on every web server. That quickly became annoying to back up (using Amanda), so I moved it all to one machine that the other servers all mounted via NFS. Eventually we ran out of disk and needed something I wouldn’t need to touch for a long time, so I chose MogileFS.

Since there wasn’t a MogileFS library for Ruby I wrote mogilefs-client. Now I needed a way to get the images back out.

FastCGI wasn’t fast enough, copying the image into the process before sending was a big performance hit. Using mod_ruby created httpd processes that were much too large. The best way to go was a web server using sendfile() that I could easily map URIs to MogileFS keys.

Ordinarily Perlbal is used with MogileFS, but Perlbal disagrees with FreeBSD. Rather than trying to fix perl modules I decided to use WEBrick instead. WEBrick already had all the important features of Perlbal with the exception of sendfile() and I had a minty-fresh MogileFS library for ruby.

First, I wrote socket_sendfile (but later learned about ruby-sendfile which supports more platforms) and integrated it with WEBrick. Then I wrote a WEBrick servlet to map URIs into MogileFS keys that would then turn around and send the file to the client using sendfile().

With this setup I managed to get about half as fast as Apache serving raw files but the load was too high for a single process WEBrick server. To distribute the load I reworked the default WEBrick server to fork multiple processes all listening on the same server socket. We ended up with each of our four servers running eight WEBrick processes.

Adding support for If-Modified-Since also improved image serving speeds, but wasn’t quite enough, so I threw in a dirty trick. Instead of going to MogileFS to verify the image on an If-Modified-Since request I just return a 304 immediately. I can trust the web browser to do the right thing and save myself a trip to MogileFS and a stat() since our images won’t disappear unless their links also disappear.

The last two tricks I used to speed-up WEBrick was disabling access logging and giving WEBrick a dedicated IP via the load balancer. Removing the extra work of logging hits resulted in a significant speedup, around twenty percent. Running requests directly to WEBrick was another twenty percent speedup since we weren’t running through Apache’s mod_proxy.

I’ve packaged up all my WEBrick speed-ups into the webrick-high-performance gem. Unfortunately the sendfile() code is still FreeBSD-specific. (I don’t have a Linux machine so I can’t test a socket_sendfile written for Linux.)

9 comments

Comments RSS FEED

Have you looked into working with Zed Shaw on implementing some of this into Mongrel? I noticed his latest release has sendfile support. Coincidence?

Adam said about 13 hours later

Nope, not coincidence. My release of socket_sendfile made the author of ruby-sendfile pipe up then Zed Shaw integrated that with Mongrel.

Mongrel adds an extra dependency and I’ve got enough software to track so I’m unlikely to use it.

Eric Hodel said about 18 hours later

I’m curious why you didn’t want to use squid.

john said about 20 hours later

Squid can’t look up files in MogileFS. It’s also another piece of software I’ll have to maintain. WEBrick can do the job all by itself, so that’s the best solution.

Eric Hodel said about 20 hours later

I see. So you chose WEBrick for (mostly) MogileFS, and chose MogileFS because adding NFS volumes wouldn’t work for you ?

I would think that having MogileFS would count as another piece to have to maintain.

john said about 21 hours later

Adding NFS volumes to distribute images runs into a problem that MogileFS solves (redundancy, scalability and reliability for storing many small files) so I need it.

Throwing extra software on top of the minimum necessary to make MogileFS work runs afoul of YAGNI so I’m going to avoid it until it becomes absolutely necessary.

Eric Hodel said about 22 hours later

I see.

Where I work we do actually just simply add more filesystems, (NFS or otherwise) and we store quite a lot of small-ish files with a decently large request rate (even with our caching) and we do have those things: scalability, reliability, redundancy. (volumes are synced and never is an image in less than two places)

But it looks like MogileFS works well for you, and a novel approach. Not being able to cache is a no-go for us.

(that said, I assume that you could front WEBrick with a cache (like squid) in reverse-proxy mode, if your request rate gets high enough.

john said about 22 hours later

Are you using custom software or something third-party for your file storage?

Supporting If-Modified-Since (especially the cheating way) drastically reduces the load on the WEBrick processes. When it gets high enough we’ll add squid or akamai.

Eric Hodel said about 24 hours later

we’re using Netapps for storing files, but it could be just any ole NFS volumes, really.

custom ? yes and no. when an image does get uploaded, there is a process that runs outside of php to actually write the file, yes. but the main purpose of the storage process is to just choose an available NFS volume to write the file to.

as for serving, requests for images are mod_rewritten to map to the mount points of the volumes, which are unique. apache serves to squid, and squid serves to the people. :)

thanks for sharing the details.

john said 1 day later

Comments are disabled