One of my long-running problems with Rails (and Ruby in general) is that it’s difficult to debug memory leaks. I’ve had a number of cases where I’ve stuck something into a long-lived array or hash and discovered much later that my Ruby process was eating over 100 MB of RAM. While ps makes it easy to see when Ruby’s using lots of RAM, actually figuring out where it went is a lot harder.
[…] I asked the Seattle Ruby Group for help, and Ryan Davis gave me a quick little memory leak spotter that he uses. I made a few additions to it, and it helped me discover that my Typo development tree was leaking 1-3 strings per hit
—Memory leak profiling with Rails via scottstuff: Memory leak profiling with Rails
I just used Scott and Ryan’s script to find a memory leak in our backend jobs related to cached_model. We have the local cache enabled for for our sites because we reset the cache every page request. In our backend jobs we don’t clear the local cache so we end up with a Hash that holding every ActiveRecord object we’ve ever retrieved from the database.
So if the warnings in the documentation weren’t enough, here it is again. If you’re using CachedModel in a backend job, be sure to disable the local cache or call cache_reset periodically to allow items to be garbage collected.
drbrain |We consume flickr photos as part of 43 Places and use their tags to automatically add them to places. One new user creepily found his own face as his hometown's featured photo.
When we first started 43 Things we placed all our images on every web server. That quickly became annoying to back up (using Amanda), so I moved it all to one machine that the other servers all mounted via NFS. Eventually we ran out of disk and needed something I wouldn’t need to touch for a long time, so I chose MogileFS.
Since there wasn’t a MogileFS library for Ruby I wrote mogilefs-client. Now I needed a way to get the images back out.
FastCGI wasn’t fast enough, copying the image into the process before sending was a big performance hit. Using mod_ruby created httpd processes that were much too large. The best way to go was a web server using sendfile() that I could easily map URIs to MogileFS keys.
Ordinarily Perlbal is used with MogileFS, but Perlbal disagrees with FreeBSD. Rather than trying to fix perl modules I decided to use WEBrick instead. WEBrick already had all the important features of Perlbal with the exception of sendfile() and I had a minty-fresh MogileFS library for ruby.
First, I wrote socket_sendfile (but later learned about ruby-sendfile which supports more platforms) and integrated it with WEBrick. Then I wrote a WEBrick servlet to map URIs into MogileFS keys that would then turn around and send the file to the client using sendfile().
With this setup I managed to get about half as fast as Apache serving raw files but the load was too high for a single process WEBrick server. To distribute the load I reworked the default WEBrick server to fork multiple processes all listening on the same server socket. We ended up with each of our four servers running eight WEBrick processes.
Adding support for If-Modified-Since also improved image serving speeds, but wasn’t quite enough, so I threw in a dirty trick. Instead of going to MogileFS to verify the image on an If-Modified-Since request I just return a 304 immediately. I can trust the web browser to do the right thing and save myself a trip to MogileFS and a stat() since our images won’t disappear unless their links also disappear.
The last two tricks I used to speed-up WEBrick was disabling access logging and giving WEBrick a dedicated IP via the load balancer. Removing the extra work of logging hits resulted in a significant speedup, around twenty percent. Running requests directly to WEBrick was another twenty percent speedup since we weren’t running through Apache’s mod_proxy.
I’ve packaged up all my WEBrick speed-ups into the webrick-high-performance gem. Unfortunately the sendfile() code is still FreeBSD-specific. (I don’t have a Linux machine so I can’t test a socket_sendfile written for Linux.)
These are two packages that I use to speed up WEBrick image serving now freshly released. Unfortunately I haven’t tested them on any platform other than FreeBSD so please file bugs if they don’t work for you.$ sudo gem install socket_sendfile $ sudo gem install socket_accept_filter
socket_accept_filter makes it easy to set the SO_ACCEPTFILTER socket option so you can enable the accf_http(9) and accf_data(9) accept filters. Accept filters delay the return from accept(2) until enough data has arrived on the socket for processing.
Shortly I’ll have enough software released to do a full write-up of high-volume image serving with WEBrick.
You can install it as a gem:$ sudo gem install mogilefs-client
Or go download mogilefs from Rubyforge.
WARNING! I’ve only been able to test NFS mode in production, so HTTP mode is not proven to work. If you find any bugs in it, please report them at the Rubyforge tracker.
Every part of the system configuration you change belongs under version control (with a few exceptions). If you’re going to be making changes to your configuration you might do something wrong and need to roll back. You might wonder why or who made a change in the future. Version control will perform CYA duties for you.
There are a few things you probably don’t want under version control. /etc/master.passwd shouldn’t be flying across the wire (and Kerberization or similar works much better for distributing passwords). Sudo will get mad if you go and touch /usr/local/etc/sudoers inappropriately, especially if it has the wrong owners.
Configuration files in /etc, /usr/local/etc, /boot, custom rc.d and periodic scripts, anything you’re going to change, add or even break needs to be under version control. (I haven’t figured out a good way of putting crontabs under version control, ideas?)
For The Robot Co-op, each machine’s configuration is in its own branch in a subversion repository to allow care-free copying of changes between machines. A change to the httpd.conf on one machine is a commit and a couple of merges away from being accurately changed on all the machines. No typos from multiple manual changes.
I’ve seen a lot of comments asking for information on our software setup, so here it is. If you’d like more detail just ask, I’ll fill you in as best I can either in a comment or in a future post.
UPDATE: Added link to Wikipedia’s MySQL configurationRead more...
|4||Dual 3GHz Xeon||6GB||70GB RAID 1||Apache, FastCGI, MogileFS storage node, memcached, image serving|
|1||Dual 3GHz Xeon||2GB||70GB RAID 1||Staging, mail, backend jobs|
|1||Dual Opteron 246||12GB||5x 73GB in RAID 5||MySQL|
The four web servers are more fluke than planning, we don’t need the capacity they have just yet. We started with two webservers, a database server and a staging/mail/backend server, all dual 3GHz Xeons. We then added a third webserver and after that the Opteron MySQL box. The old database server was recently repurposed as a webserver.
Site traffic is currently spread across all four web boxes as each box runs all of our sites by a hardware load balancer of unknown manufacture. Eventually we’ll switch to running the 43 Things on a pair of machines and all other sites on the remaining machines.
In order to get some speed back I’m using a sendfile(2) extension written using RubyInline. Due to the double-dispatch images end up being noticable slower, but at least I won’t have to restart whole webservers every couple hours to get images back on track.