The Ruby Stdlib is not a Ghetto

drbrain | Wed, 24 Nov 2010 23:21:00 GMT

Posted in

I have read Mike Perham's post on ruby's stdlib and strongly disagree.

Mike presents only the thinest of arguments, that his list of libraries are "old and crufty". While many of the libraries are old, few are crufty and he does not present any list of libraries that should replace them and I find this very troublesome.

It's very easy to say, "this sucks", it's only slightly less easy to say "this sucks and here is why" and it's hard to say "this sucks, here is why and I'm going to do something about it".

Mike picks out Net::HTTP as his prime example saying Its performance and API are just terrible but does not back his performance claims with any benchmarks and does not back up his API claims with any examples. The rest of his libraries fall under "old and crufty" but I've heavily used many of the libraries listed and don't share those experiences.

Net::HTTP Performance

I have benchmarked Net::HTTP against the curb gem and it is far from slow. When I performed these benchmarks I found that curb was terribly broken. Curb didn't use persistent connections (and as a consequence maintained the open connection until GC) and didn't support manual closing of connections as a workaround to the first issue. This made curb's performance atrociously bad.

These issues were rapidly fixed and under two weeks later curb 0.7.3 fixed the performance issues. Compared to curb 0.7.3, Net::HTTP is about 2.4 times slower than curb. I find this damn good when comparing a pure-ruby library to a C extension.

Of course, if you're serious about performance you need to use persistent connections (which is the main reason curb has historically been said to be ludicrously faster than Net::HTTP), so use the net-http-persistent gem. The implementation is quite clean and simple and the documentation provides recommended Net::HTTP APIs to use with it.

With no offense intended towards the curb maintainers, it appears that the performance problem was introduced six months before the fix, took seven releases to be noticed and six more to be properly diagnosed. I've seen lots of grumbling about performance in the ruby community but it appears that the grumblers are more likely to repeat somebody else's complaints instead of investigating the problems and proposing solutions.

Net::HTTP API

The Net::HTTP API may appear clunky, but it provides many ways to do the same thing for programmer convenience. For the majority of my usecases I use only this:

require 'net/http'
require 'uri'

u = URI.parse 'http://blog.segment7.net/'

http = Net::HTTP.new u.host, u.port

request = Net::HTTP::Get.new u.request_uri

response = http.request request

I typed this out of memory without consulting the documentation!

I will admit that the API documentation is organized to get you up and running quickly with the convenience methods instead of explaining the best way to use it until much later (Example #4 for GET+print, Example #3 for POST), but this is a relatively easy thing to fix.

Net::SMTP

I wrote ar_mailer which sits atop Net::SMTP. I had to add one monkeypatch to Net::SMTP, support for RSET (and I'm a bad person for not committing it to ruby). Everything else is handled easily by Net::SMTP. I don't see how you could argue that it is old or has a crufty API as the SMTP protocol hasn't changed much over the years.

Net::IMAP

I've written several IMAP tools to read my email for me. Net::IMAP has been very pleasant to work with. I've even contributed support for IDLE to Net::IMAP. I don't see how you could argue that it is old or crufty as I've found it relatively easy to support the various IMAP extensions.

WEBrick

I'm probably one of the few people who has used WEBrick in large-scale production. I served millions of HTTP requests with WEBrick at about half the speed of apache (I believe WEBrick took 25ms to serve a static image as opposed to apache which took 12ms).

WEBrick has a wonderful API and features built-in CGI, SSL, cookie handling, Basic and Digest authentication along with multiple methods of password lookup, Proxy, SSL and an easy to use handler API.

What WEBrick is missing is the process control features provided by other ruby web server like mongrel or unicorn.

REXML

We all know that REXML is not the best and Nokogiri is the best XML library in ruby, but replacing it has many issues to resolve.

DRb and Rinda

These two libraries are beautifully simple and provide an excellent example of the power of ruby. There's a terrific amount of fun you can have with these two libraries.

Net::FTP, Net::POP, Net::Telnet, RSS

I've never used any of these libraries and can't say anything good or bad about them. I'd like to see concrete reasons why they should be removed.

Making the Ruby stdlib Better

Mike wants Ruby to get a good spring cleaning but is only proposing outright removal. I don't see how this is at all constructive or improves any issues.

If you have actual problems with these libraries what are they? Have you filed a ticket? Can you fix them? Have you submitted a patch?

When I found the problems with curb I didn't just complain about them, I took the about half a day to figure out what the root problems were and filed tickets as I found I wasn't capable of fixing the problem. This involved digging around in man pages, reading the curb C extension and fiddling with strace. I took the time to make curb better even though I'm probably never going to use curb outside of a benchmark.

Asking the questions "what is wrong?" and "how can I make it better?" makes ruby better for everyone, it's what open source is about.

Saying "we should just remove stuff" without a plan for what it's going to be replaced with and how we're going to avoid breaking existing libraries is nothing more than harmful whining. If you want to see the standard library improved you need to sit down and get to work on making it better or replacing it with something that has the same features and more.

Recent stdlib Improvements

Ruby 1.8's YAML is based on Syck which was written by _why. Even before _why left ruby there were numerous unfixed bugs in Syck. Aaron Patterson attempted to fix these bugs but eventually gave up and wrote a new YAML implementation based on libyaml called Psych. Psych supports the same API as Syck and is an excellent example of how to improve the standard library.

Ruby currently uses a pure-ruby implementation of Date and DateTime. Jeremy Evans has written home_run and is attempting to import it into 1.9. There is currently an objection due to possible incompatibilities between the current date.rb and home_run but these should be resolved shortly.

Aaron's Psych and Jeremy's home_run show that the ruby maintainers are willing and ready to accept replacement of parts of the standard library. I think the expectations of the core team are both fair and reasonable as they are designed to minimize breakage of existing ruby software.

Ruby's stdlib is only a ghetto if you treat it like one. comments

Comments RSS FEED

But if it doesn’t use fibers, it isn’t roflscale.

raggi said 34 minutes later

My thoughts exactly. ;)

Javier Cicchelli said 36 minutes later

I’m not sure that an HTTP library that doesn’t support multipart can be considered modern :)

ngw said about 1 hour later

ngw: RFC 2616 section 3.7.2 says “In general, HTTP treats a multipart message-body no differently than any other media type: strictly as payload.” So multipart is not a function of HTTP.

It sounds like Ruby is instead missing an RFC 2045 implementation.

Eric Hodel said about 2 hours later

Ruby stdlib RSS is horrible—look at the code. It’s so awful that I wrote my own RSS and Atom parsing library (syndication) rather than deal with it. The result was faster as well as much more straightforward.

Even if stdlib RSS’s code wasn’t horrible, RSS support without Atom is pretty pointless, the RDF stuff doesn’t belong in stdlib, and the feed generation code is duplicating Builder and doing it badly.

telnet is a dead protocol these days and should probably go too, but POP and FTP are still widely used.

mathew said about 2 hours later

@matthew

“telnet is a dead protocol these days and should probably go too” — I wouldn’t go that far. Telnet is a great way to reach many tcp services easily.

bryanl said about 2 hours later

That was quite the dick sandwich you just served.

Ted price said about 2 hours later

There seems to be some apprehension about tackling Ruby stdlib. Everyone agrees it needs more attention and refinement, but few are jumping in. Working on MRI is not that different from any other large open-source project:

1. Identify broken window.
2. Brainstorm possible improvements.
3. Fix/rewrite broken window.
4. Run tests.
5. Submit patch to bug-tracker (http://redmine.ruby-lang.org/)

postmodern said about 3 hours later

postmodern: I became a ruby committer because I wrote documentation for the standard library, then upgraded to fixing RDoc. It’s really, really easy to become a committer for Ruby!

Ted: I think that’s the first completely inane comment on this blog, thanks!

Eric Hodel said about 4 hours later

I agree pretty strongly that Net::HTTP’s API could use some work. It deals with the simple use case great, but there’s no documented example of how to pass a hash of params to a GET request. Compare this to a library like Typhoeus where the api and the documentation are very straightforward.

This is personal opinion here, but I don’t like APIs that provide many different ways of doing the same thing. As a developer trying to understand a library, it can be really frustrating when I’m looking for examples online and all of them are using a different approach to what I’m trying to do.

Mike Emery said about 4 hours later

For me the issue isn’t that the stdlib is a ghetto, it just seems an arbitrary collection of libraries – some the best in their field, others not.

Despite the varied quality and usefulness, their inclusion in the stdlib is a implicit endorsement by the core team.

Instead of convincing developers that fixing corners of the stdlib isn’t hard, breaking them out into gems lowers the (possibly perceived) barriers to entry and encourages competition with new gems.

Moving the standard libraries to gems would also give users of them finer grained control over versions. This would avoid messy situations like the FasterCSV to CSV migration and conflicts between libraries that already exist in the stdlib and rubygems (like soap4r).

James Healy said about 5 hours later

Perhaps the discussion should not about what to remove and what not to remove, but about how to document and fix what’s already there? Marc-andre had a good blog post about this a while back:

http://blog.marc-andre.ca/2009/09/best-time-to-get-involved-in-ruby-core.html

phiggy said about 6 hours later

The actual code of Net::HTTP and other stdlib offerings are a little crufty… poorly documented, unidiomatic, etc. They are difficult to just dive into and offer improvements for. They need a good scrubbing.

I looked through the RubySpec for Net::HTTP and it made assertions that it behaved like Net::FTP. What? That doesn’t make perfect semantic sense… I know if I dig further (and I will) it will make sense, but that’s just smelly, right?

The docs are in need of a lot of work, too. People get frustrated because they are not properly educated, and that’s usually the fault of the docs.

All of these complaints can easily be addressed: better documentation isn’t that difficult to contribute, and someone can certainly go through and comment, clean, and improve the code without having to change functionality. Time and motivation are the only real factors preventing this from happening, though.

Obviously these libraries are being actively maintained, but it just doesn’t seem like it sometimes.

So, as part of my bitching, I’m following through and giving the RubySpec a good look through and seeing what I can contribute in terms of documentation and code improvements for Net::HTTP. No use in being opinionated and (arguably) knowledgeable unless I actually put it to good use.

Matt Todd said about 8 hours later

Sorry, but home_run will not be accepted shortly, so it’s not good example.

Shtirlic said about 9 hours later

I can say one stdlib, which doesn’t have almost any documentation at all and since most of the things are implemented in C then there’s no valuable RDoc either. It is called OpenSSL. Try to do a plain RSA signing procedure with that without having to look into C directly!

My point is that sometimes there ought to be some documentation, at least when most of the functionality is in C and not in Ruby.

Jarmo

Jarmo Pertman said about 10 hours later

I maintain curb at least since 0.2.×. And usually I use net/http

Curb is interesting but it’s a complex library that Is bad need of a good refectoring. A thin c layer and more ruby would make it much more robust.

Reliable timeouts is another thing curb initially offered – I’m not certain that’s the case any more, would need some better methods to test…

Curb and many c gems do not play well with ruby’s thread model it was very tricky to get curb to allow multiple threads to run a request … Stdlib does not have this issue…

I main use for curb now is for mass downloads.

taf2 said about 13 hours later

Shtirlic: Psych wasn’t accepted overnight, nor was FasterCSV. I don’t see why it is unreasonable for these things to take some time. It’s not as if ruby is preparing for an imminent release.

Matz, Yugui, Nobu, et al. are obviously open to home_run replacing date.rb and they’re not moving forward with this any more slowly than any other replacement that has happened.

Eric Hodel said about 19 hours later

Comments are disabled