RubyGems Beta Approaching

Eric Hodel | Fri, 05 Oct 2007 18:01:36 GMT

Posted in ,

RubyGems 0.9.5 is almost done, and has loads of good stuff in it, including platform support and an improved indexer script:

[W]e were rebuilding the gem index on RubyForge, [...] doing it “in place”, so that the current index would be overwritten and then populated over the course of the build.  These take a fair while – 10 minutes or so – and during that time the index was essentially empty.  Booooo.

Well, no longer.  Eric Hodel has twiddled the gem index builder to build it in a temporary directory and then move it in place.  So those gem index outages should be a thing of the past.  Thanks Eric!

More reliable gem installs via Junior developer

This will also reduce the number of bulk index updates dramatically, since it’ll actually be there nearly all the time.

Also, Wilson Bilkovich added a new Marshal formatted index that will reduce both bandwidth usage and memory consumption. Instead of 120M or so it takes to do a bulk yaml index update, it takes about 30M with a Marshal index update. You’ll have to wait for the beta to test this one out, though.

7 comments

Comments RSS FEED

Ah, this will be nice. Thanks for the ongoing work!

Gavin Kistner said 5 minutes later

This all sounds great, thanks :)

Does the new marshal index affect the loaded size at all?

I was doing some tests recently and found that the strings from loading rubygems don’t generally get cleared prior to loading the rest of the application (as generally, GC.start doesn’t get called, and it takes a while of loading to get above the malloc limit). I found it cut raw string memory usage by over half.

Test program and a few more minor details are here: http://blog.ra66i.org/archives/informatics/2007/10/05/calling-on-the-gc-after-rubygems/

On my development machine I have quite a lot of gems installed (I think?), around 150 or so. I’ve added a GC.start call at the end of rubygems.rb locally, and found that I couldn’t measure the affect on rubygems load times, yet I know it clears at least 45kb of strings out of memory on my machine (from a total of 98kb) (without facets installed, heh).

raggi said about 4 hours later

The Marshal index only affects operations with the `gem` command, unless you’re using rubygems as a library and do a remote operation (installing a gem, listing remote gems, etc.).

require 'rubygems' now requires only classes it needs for #gem and #require to work, so without requiring any gems, RubyGems now adds about 1.5M on OS X. Running GC.start makes no difference for me when requiring ActiveRecord with trunk RubyGems.

Eric Hodel said about 5 hours later

It’s not requiring a specific gem that makes a difference, it’s requiring rubygems itself (as shown by the test program I left on my blog).

I guess I’ll just wait and have a look at the new release, then explain myself better if it’s still there…

raggi said 3 days later

Yes, I saw that too, and even with GC.start right after require ‘rubygems’ I get no difference in size.

Eric Hodel said 3 days later

Hrm, I should have run it. I see:

#       strings_no_gems.log:            1833 bytes, change:      100%
#    strings_gems_no_gc.log:           38114 bytes, change:     2079%
#          strings_gems.log:           14877 bytes, change:       39%

It looks like its largely interpreter created garbage which will be cleaned up as soon as the heap fills.

Eric Hodel said 3 days later

Quite right,

I have come across an interesting article on heap fragmentation in ruby [1], and thought “hmm that’s interesting, I wonder how fast that happens”. I also quite a long time ago read _why’s notes on ruby garbage collection [2], and in particular noticed the #define GC_MALLOC_LIMIT 8000000 he pointed out.

I haven’t spent the time to look for a proper justification of my gut feeling, but I think maybe it’s well worth calling the GC earlier rather than later, when you know you might be generating a lot of temporary objects. My second comment regarding running GC manually, was simply that it costs basically nothing to the application load time, but clears up a lot of space (particularly in this case).

[1] http://zdavatz.wordpress.com/2007/07/18/heap-fragmentation-in-a-long-running-ruby-process/ [2] http://whytheluckystiff.net/articles/theFullyUpturnedBin.html

:)

raggi said 3 days later

Comments are disabled