Streaming zlib processing for Ruby
drbrain |
Earlier today I checked in a patch that adds streaming zlib processing to Ruby. This allows you to process a stream without needing to allocate space to hold the entire result. To add the streaming support I changed #inflate and #deflate to accept a block. A handful of other methods (such as #finish) accept a block as well due to the internals of Zlib. Here's an example:
require 'zlib'
# dd if=/dev/zero of=/dev/stdout bs=1m count=1024 | gzip -c > 1G.gz
gzipped = File.read '1G.gz'
# auto-detect gzip
z = Zlib::Inflate.new Zlib::MAX_WBITS + 32
# This prints NULs
z.inflate gzipped do |chunk|
puts chunk
end
# Don't forget to finish the stream, there may be bytes leftover.
puts z.finish
If the input was a zlib stream instead of a gzip stream, this would work:
Zlib.inflate deflated do |chunk|
puts chunk
end
But Zlib.inflate doesn't allow specification of window bits for gzip auto-detection.
The biggest difference is in memory used. Here's 1GB of gzip-compressed NULs:
$ ll 1G.gz -rw-r--r-- 1 drbrain staff 1042069 Jul 9 17:09 1G.gz
Here's the memory used inflating this stream:
$ /usr/bin/time -l make -s runruby > /dev/null
3.56 real 3.25 user 0.29 sys
46682112 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
12644 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
1 signals received
3 voluntary context switches
14 involuntary context switches
Only 46MB used!
Here's without streaming:
$ /usr/bin/time -l make -s runruby > /dev/null
3.82 real 3.21 user 0.60 sys
1079824384 maximum resident set size
0 average shared memory size
0 average unshared data size
0 average unshared stack size
264889 page reclaims
0 page faults
0 swaps
0 block input operations
0 block output operations
0 messages sent
0 messages received
1 signals received
3 voluntary context switches
18 involuntary context switches
Over 1GB used!
As you may have noticed, despite all the extra calls back to ruby to execute the block, the streaming output was 250ms faster, but does that hold up?
Benchmarks
For all these benchmarks I chose files of NULs as the inflate to large files as this exercises the growth of the output buffer the most, which is what this new feature changes. In general, these benchmarks show that in single-thread use streaming improves inflate speed especially as the output size increases while in multi-thread use streaming reduces inflate speed unless the output size grows very large.
I chose the input sizes of 10KB, 100KB and 1GB as tests of the internals of the zlib extension. At 10KB buffer expansion will only happen once for streaming mode, but several times for non-streaming. 100KB is in the realm of compressed HTTP responses and has a reasonable balance between the way streaming and non-streaming work are handled. 1GB is extremely large and when multi-threaded would be taxing on my laptop's memory capacity for non-streaming and taxing on the GVL while streaming.
Using files that are less compressible would be a better real-world example and may change the speedups shown. For single-thread tests, the speedup would likely increase while for multi-thread tests the slowdown would likely decrease.
With a 1GB file of NULs, ministat shows a 6% speedup:
x stream
+ no-stream
+------------------------------------------------------------------------------+
| + |
| + |
| ++ |
| ++++ |
| x +++++ |
| x xxx +++++++ |
| x x xxxx xxx x +++++++ |
| x xxxxxxx xxxxx x ++++++++ + |
|x x xxxxxxx xxxxxx x x x x x x ++++++++++ +++ +|
| |_________MA__________| |
| |__A__| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 50 3.4484119 3.690871 3.497622 3.5034132 0.045816556
+ 50 3.7066622 3.7709568 3.7281418 3.7290156 0.013754228
Difference at 95.0% confidence
0.225602 +/- 0.013422
6.4395% +/- 0.383111%
(Student's t, pooled s = 0.0338255)
I believe this is due to the non-streaming execution needing to realloc() the output buffer during expansion. The streaming version yields the buffer to the block and creates a new buffer which avoids the realloc() overhead for large strings.
For 100KB of NULs inflated 100 times, no speedup is shown:
x stream
+ no-stream
+------------------------------------------------------------------------------+
| + xxxxx |
| +++ +xxxxx |
| +++++*xxxxx x |
| +++++++*xxxxxxx + + + + |
|+ +++++++**xxxxx* * xxx + +*++ ++ + + + ++ + x x + +|
| |_______M__A__________| |
||________M____________A____________________| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 50 0.036626816 0.045912027 0.037170887 0.037605586 0.0017813789
+ 50 0.035252094 0.047364235 0.036613226 0.038649669 0.0033843024
No difference proven at 95.0% confidence
At 10KB of NULs inflated 1000 times each run, the improvement drops to 6%:
x stream
+ no-stream
+------------------------------------------------------------------------------+
| ++ |
| xxx ++ |
| xxx +++ |
| xxx +++ |
| xxxx x +++ |
| xxxx x ++++ |
| xxxx x ++++ |
| xxxxxx ++++ + |
| xxxxxxx +*+++++ + |
| xxxxxxxx **++++++ ++ * + + * + +|
||____M_A______| |
| |_______M___A___________| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 50 0.046477079 0.05746603 0.047408819 0.047745066 0.0016545295
+ 50 0.049071074 0.064721823 0.049803019 0.050768676 0.002927008
Difference at 95.0% confidence
0.00302361 +/- 0.000943385
6.33282% +/- 1.97588%
(Student's t, pooled s = 0.00237748)
In this case the improvement is likely due to the way the buffer is expanded. Streaming mode jumps from the initial buffer size (1KB) to the maximum buffer size (16KB) immediately while non-streaming mode increments the buffer in steps. I'll need further performance tests to see if eliminating stepping output buffer growth is worthwhile for both modes.
When multiple zlib streams are being processed in parallel streaming is 20% slower when expanding a 10KB streams of NULs 1000 times in four threads:
x stream
+ no-stream
+------------------------------------------------------------------------------+
| + xx |
| + + x xx |
| ++ + x xx |
| +++ + x xx x |
| ++++++ x xx x |
| ++++++++ xxxxx x x |
| + ++++++++ + xxxxxxx x x |
|++ + + ++++++++ ++ + + xxxxxxxxxxxx xxx x x|
| |___MA____| |
| |____A____| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 50 0.19220901 0.21440268 0.19658995 0.1977196 0.0041590721
+ 50 0.14678812 0.17618799 0.15705991 0.15724883 0.0045392638
Difference at 95.0% confidence
-0.0404708 +/- 0.0017274
-20.4688% +/- 0.87366%
(Student's t, pooled s = 0.00435332)
With 100KB of NULs expanded 100 times in four threads there's a 14% slowdown:
x stream
+ no-stream
+------------------------------------------------------------------------------+
| + +++ + xxxx x |
| + +++ + xxxxxx x |
| + ++ +++++++ ++ + + + xxxxxxxx x + x |
|++ ++++++++++ ++ + + + + +++ ++ xxxxxxxxxxxxxx *xxx x xxx|
| |____M_A______| |
| |________M____A_____________| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 50 0.069380999 0.076375008 0.071210146 0.071699142 0.0016866267
+ 50 0.057549715 0.073060989 0.060353994 0.061419401 0.0034865206
Difference at 95.0% confidence
-0.0102797 +/- 0.0010867
-14.3373% +/- 1.51564%
(Student's t, pooled s = 0.00273866)
These slowdowns are likely due to increased GVL contention. As the output size grows the slowdown decreases.
With 1G of NULs expanded once in four thread's there's a 1% speedup:
x stream
+ no-stream
+------------------------------------------------------------------------------+
| x x + |
| x x++x++ |
| x ***x++ |
| xx****** |
| xx****** |
| xx********+++ |
| x**********++ +x x + + + +|
| |__A___| |
||________M__A__________| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 50 5.1492789 5.3822649 5.2116029 5.2147137 0.043439149
+ 50 5.1704421 6.0051818 5.2339363 5.2634319 0.1322479
Difference at 95.0% confidence
0.0487181 +/- 0.0390566
0.934244% +/- 0.748968%
(Student's t, pooled s = 0.0984288)
The speedup is due to realloc() needing to copy the string multiple times.
Verifying benchmarks with ministat
drbrain |
Most people know to benchmark to verify performance improvements, but comparing a handful of results isn't enough. You need to be sure your results are statistically significant. ministat is a tool written by Poul-Henning Kamp that determines if two result sets are statistically significant and display the difference in performance between the two.
I recently used ministat to measure performance changes for my GVL release patch for Ruby's Zlib extension. The patch will allow multiple zlib or gzip streams to be deflated or inflated in parallel. I was asked to show any performance difference because the GVL release/acquire can be expensive depending upon the work done without the GVL.
I posted a video showing the difference between inflating very large files, but what about single-threaded use of Zlib? What about the worst case? While parallel processing of large streams is great, it shouldn't overly penalize single-thread or small-stream processing. I ran several benchmarks of single-threaded use to show minimal difference, but in the worst-case benchmarks I couldn't be sure there was any significant performance change.
Zlib background
For simplicity I'll only be mentioning inflate from here on, but the same loop handles inflate and deflate of gzip, zlib and raw deflate streams identically.
While the Zlib extension is written in C, I've written this in roughly equivalent ruby while hand-waving over some of the details of how libz works since they're unimportant to the performance comparison. Before my patch run loop looked like this:
setup_input_buffer stream
setup_output_buffer stream
loop do
status = inflate stream
schedule_another_thread
return output_buffer if status == STREAM_END
if status == BUF_ERROR then
expand_output_buffer stream
next
else
# handle other errors
end
end
First the input and output buffers are set up. The input buffer contains the deflated string to be inflated and the output buffer will contain the inflated string. The default output buffer size is 1024 bytes.
Inside the loop inflate is called which will expand as much of the input stream as will fit in the output buffer, then ruby is given the chance to schedule another thread to provide fairness, then errors are checked.
A BUF_ERROR indicates there isn't enough room in the output buffer and more space needs to be added. The expand_output_buffer function adds up to another 16KB to the output buffer. Combined with scheduling another thread each time inflate returned, fairness is provided.
With my patch the run loop looks something like this:
setup_input_buffer stream
setup_output_buffer stream
release_GVL do
until interrupted do
status = inflate stream
return output_buffer if status == STREAM_END
if status == BUF_ERROR then
expand_output_buffer_without_GVL stream
next
else
# …
end
end
end
The differences are releasing the GVL, the interrupted flag and the change in expanding the output buffer.
To reduce copying the Zlib extension uses a ruby String for the output buffer, so expanding it requires GC interaction. The new function uses realloc(3) directly to avoid GVL contention and mirrors rb_str_resize().
The interrupted flag allows processing of timeouts, a killed thread or clean shutdown to proceed smoothly. Like the no-GVL version the buffer is only expanded by 16KB to allow interruptions to be serviced quickly. Without this flag a large stream would need to complete before the thread it is running in could be shut down which may take several seconds. (In the C version of release_GVL, rb_thread_blocking_region(), you provide an unblocking function that sets the interrupted flag when ruby wants to regain control of the thread.)
Releasing the GVL allow other threads to run the Ruby VM so while a stream is being inflated other threads can run on other CPUs. For large streams this can result in a performance increase when other Ruby threads are running.
Worst Case Benchmark
After inspecting the two implementations we have enough information to create a worst-case scenario for testing the performance of the patch.
Since GVL release/acquire can be expensive we want to do that as much as possible which means inflate should do as little as possible. The buffer should not be expanded since that allows more than one trip through inflate() per GVL release/acquire, so inflated strings must fit in the 1024 byte output buffer.
I wrote a benchmark that inflates many strings with an inflated size of 1000 bytes using multiple threads, but not so many strings as to prevent a quick return from the benchmark section. (While inflating strings of zero bytes would probably be worse, that seems unrealistic.)
Here's the benchmark program I wrote:
require 'zlib'
require 'benchmark'
r = Random.new 0
file_count = 10_000
deflated = (0..file_count).map do
input = r.bytes 1000
Zlib::Deflate.deflate input
end
times = Benchmark.measure do
(0..3).map do
Thread.new do
deflated.each do |input|
Zlib::Inflate.inflate input
end
end
end.each do |t|
t.join
end
end
puts times.real
Instead of using time, Benchmark.measure is used, but only around the inflate part. This reduces the amount of noise in the benchmark. By choosing only 10,000 files multiple benchmarks can be run (on my machine the inflate portion takes a little over ½ second). By using the same random seed the benchmark is repeatable.
Before using ministat I ran the same benchmark but with 100,000 files I had the following results without the patch:
$ for f in `jot 5`; do ruby20 test.rb; done 5.420000 5.970000 11.390000 ( 8.162893) 5.400000 6.270000 11.670000 ( 8.263046) 5.460000 5.920000 11.380000 ( 8.133742) 5.410000 6.290000 11.700000 ( 8.289913) 5.500000 6.620000 12.120000 ( 8.478085)
and with the patch:
$ for f in `jot 5`; do make -s runruby; done 5.120000 6.240000 11.360000 ( 8.039715) 5.240000 6.260000 11.500000 ( 8.097961) 5.280000 5.940000 11.220000 ( 8.004246) 5.210000 6.360000 11.570000 ( 8.171124) 5.240000 6.200000 11.440000 ( 8.054929)
So while running with the patch seemed faster, the results overlap by a small margin. I wanted to be reasonably sure there was an improvement which is where ministat comes in. In order to get useful data out of ministat I would need more data (this is easy to verify by running ministat on these results and seeing that it says "No difference").
To capture the data I used a shell for loop to run the benchmark 50 times and redirect the output to a file: for f in `jot 50`; do make -s runruby; done > with, and again without the patch applied directed to a "without" file.
ministat
To determine if the results were statistically significant I ran the data through ministat with a confidence interval of 99%. From the ministat man page:
The ministat command was written by Poul-Henning Kamp out of frustration over all the bogus benchmark claims made by people with no understanding of the importance of uncertainty and statistics.
So through ministat we will be able to determine if the suspected difference from the previous benchmark is an actual difference, and what the certainty of that difference is.
$ ministat -c 99 -s without with
x without
+ with
+------------------------------------------------------------------------------+
| + + + x * xx x |
| + x + + + + + + x **+xx*xx x * x + x |
|+ ++++ x + x*+ * + ++x+x++x*++x* ***+x**xx * +x**** * + x x xx|
| |_____________A_____________| |
| |______________A_______________| |
+------------------------------------------------------------------------------+
N Min Max Median Avg Stddev
x 50 0.61660767 0.71641994 0.66669798 0.66647618 0.022548872
+ 50 0.59242272 0.69294882 0.64941692 0.64917859 0.02509235
Difference at 99.0% confidence
-0.0172976 +/- 0.0125332
-2.59538% +/- 1.88051%
(Student's t, pooled s = 0.0238545)
Looking at the last four lines, you can see there is a statistically significant difference in the benchmark results at 99% confidence. Running the benchmark with the patch shows a 17ms decrease in time ± 13ms or a 2.6% decrease in time ± 1.9% for this sample set, so I can say "minor improvement".
The box shows a plot of the with and without values along with the average (and median) values, and standard deviation bars. For full details see the ministat man page.
There are a couple of likely reasons for the slight increase in performance such as the GVL release/acquire being inexpensive enough that inflate takes longer allowing parallelism or that GVL release/acquire isn't as expensive as thought, but without an easy way to measure GVL contention it's only speculation.
Getting ministat
To my knowledge ministat only ships with FreeBSD, but fortunately it is really easy to add to nearly any system with a C compiler. You can find a copy of ministat in this repository (which, ironically, uses autotools) or you can grab the C file directly and run cc ministat.c -lm -o ministat and move the ministat executable into your PATH like I did.
On Community Funding of Open Source
drbrain |
The other day Yehuda Katz announced a kickstarter for creating Rails.app, an OS X application that makes it easy to bring new programmers to Rails and Ruby.
I think the idea is fantastic. When I first started learning Ruby I had to do battle with compilers, RubyGems didn't exist, the RAA had but a handful of libraries, and there weren't really even any tutorials for learning Ruby around. I had either fortitude or stubbornness on my side to get through the much steeper learning curve than many of you had to deal with to get where I am today, and it sucked.
I think the idea of soliciting community for funding is fantastic. It's not the first time a Rubyist has solicited money to work on open source. I funded Gregory Brown's work on Prawn a few years ago and was happy to do so even though I've never used Prawn (although I'd like to someday) I was incredibly happy to support it. I'll probably never use Rails.app, either, but that doesn't mean I'm not happy to support it, nor that Rubyists are wrong to support it.
Now, to get to my point, I don't understand why I keep seeing such negative feedback around Yehuda's choice of soliciting money to bring us Rails.app. Sure, it may just be a few of you, but if you don't want to give Yehuda money, fine, just don't. If you think Yehuda is asking for too much money, fine, just don't donate. If you just don't like Yehuda, fine, just don't donate.
But stop with the personal attacks, even the snarky ones and the jokes at Yehuda's expense.
You would not find it so funny if people were ganging up on you.
This is a fantastic idea Yehuda has presented and if he had the ability he would bring it to us for free. Right now he doesn't have that ability so he's asking for our help.
Being a major contributor to open source is incredibly difficult. While thousands or even millions of people may be happily using your software, you mostly hear from the people who are having problems with it, especially after they're extremely frustrated. Throwing insults atop this is incredibly demotivating and depressing, so cut it out.
If want to continue to complain, whinge or make cruel jokes, towards Yehuda or any other Rubyist, why not take that effort and put it towards something positive. Do something that is welcoming to new Rubyists like improving some ruby documentation, submitting a bug report or submitting a patch to fix a bug.
Stop making our community look like a bunch of jerks. It's fine if you disagree with another Rubyist, especially one who has contributed so heavily to Ruby, but you should have enough respect for them to be polite.
A use of Enumerable#chunk
drbrain |
In Ruby 1.9, Enumerable has a few new methods including Enumerable#chunk (which was added for 1.9.2). The #chunk method walks your Enumerable and divides it into chunks based on a selecting block. Unlike Enumerable#partition, the chunks are returned in-order. Here's an example from the documentation:
[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5].chunk { |n|
n.even?
}.each { |even, ary|
p [even, ary]
}
#=> [false, [3, 1]]
# [true, [4]]
# [false, [1, 5, 9]]
# [true, [2, 6]]
# [false, [5, 3, 5]]
When I first saw this method I thought, "this looks like a useful method… but how?"
I'm working on bringing Markdown support to RDoc and the last remaining base Markdown feature I need to support is a hard break due to two spaces at the end of a line in a paragraph.
For background, RDoc parses various formats into a common syntax tree which is can then be transformed for any supported output (such as HTML, colored ANSI text, etc.). In this syntax tree a paragraph can contain one or more strings which are joined at output time into the paragraph you see.
To add hard line breaks, I decided to create a new HardBreak object and inject it into the paragraph where two trailing spaces are encountered in the source document. The formatters can then be updated to insert the appropriate line break character when emitting a paragraph.
Enumerable#chunk comes in because the Markdown parser doesn't join strings as it's parsing (since the grammar rules get re-used) and is instead performed as a post-processing step. (String joining as a post-processing step also makes the parser cleaner by hiding the ugliness in one spot rather than spreading it across multiple grammar rules.) Before inserting HardBreak objects this was sufficient:
parts = paragraph.parts.join.rstrip
paragraph.parts.replace [parts]
But now I need to join String chunks and include HardBreaks as-is which is a perfect use of Enumerable#chunk:
parts = paragraph.parts.chunk do |part|
String === part
end.map do |string, chunk|
string ? chunk.join.rstrip : chunk
end.flatten
paragraph.parts.replace parts
The 1.8-compatible implementation is much uglier since I have to track whether I'm in a String chunk or not in addition to performing the processing. I'm too embarrassed to post it, but you'll be able to find it in the rdoc source once I commit and push it.
hoe-travis
drbrain |
hoe-travis is a Hoe plugin that allows your gem to gain maximum benefit
from travis-ci.org. The plugin contains
a .travis.yml generator and a pre-defined rake task which runs
the tests and ensures your manifest file is correct.
With hoe-travis it is easy to add additional checks. Custom checks can be easily verified locally by simply running a rake task instead of committing and pushing a change, waiting for travis to run your tests, then trying a new commit if you didn’t fix the problem.
Features
-
.travis.yml generation task
-
Pre-defined rake tasks which are run by travis-ci
-
Easy to hook up rake tasks for additional travis-ci setup or checks
Getting Started
If you’re not already using Hoe with your project, see: docs.seattlerb.org/hoe/Hoe.pdf
To get started with hoe-travis, first install it:
sudo gem install hoe-travis
Then add hoe-travis as a plugin to your Rakefile:
Hoe.plugin :travis
Then generate a .travis.yml
$ rake travis:generate
This will bring up your EDITOR with your travis.yml for any desired tweaks. Save the file when you’re done, then check in your .travis.yml. For further details of how the configuration is generated see Setup at Hoe::Travis and Configuration at Hoe::Travis.
(If you don’t have the EDITOR environment variable set to your favorite
editor, please do so. Note that some editors may need extra flags to wait
for files to be edited. For MacVIM, export EDITOR="mvim
--remote-wait" will wait for the file to be closed before
returning.)
If you would like to make future changes to your .travis.yml you can run:
$ rake travis:edit
Which, like travis:generate, will bring up your EDITOR with
your .travis.yml. When you’ve saved the file the changes will be checked
by travis-lint before writing back to .travis.yml and give you a chance to
correct them.
If you’ve edited your .travis.yml by hand you can run:
$ rake travis:check
to check it.
Testing your travis-ci setup is easy with hoe-travis. You can run:
$ rake travis
to run the same checks travis-ci will. By default this includes running the tests and ensuring the Manifest.txt file is complete. There is also the before script:
$ rake travis:before
Which will run the setup tasks needed for your project.
You can also enable and disable travis-ci using rake
travis:enable and rake travis:disable. See
Setup at Hoe::Travis for details.
Forever-valid SSL certificates
drbrain |
If your library uses X509 cryptography, naturally your tests will need a key and valid certificate to test against. Creating a key and certificate frequently can quickly drain your entropy pool which slows down your tests.
Instead of creating the key for every test startup you can create it once and load it off the disk like this:
class TestMyGem < MyGem::TestCase
private_key = File.expand_path '../../../test/private_key.pem', __FILE__
private_key = File.read private_key
PRIVATE_KEY = OpenSSL::PKey::RSA.new private_key
# …
Sure, you can rebuild the certificate every time with a validity time of an hour, but why not create a forever-valid certificate to go with it? No reasonable person would ever use a key shipped with an open project anyhow. Here's how to generate such a key and certificate:
require 'openssl'
# purposefully short key length
key = OpenSSL::PKey::RSA.new 512
# bogus subject and issuer
name = OpenSSL::X509::Name.parse 'CN=nobody/DC=example'
cert = OpenSSL::X509::Certificate.new
cert.subject = name
cert.issuer = name
cert.version = 2
cert.serial = 1
cert.not_before = Time.now
# lasts as long as X509 allows
cert.not_after = Time.gm 9999, 12, 31, 23, 59, 59
cert.public_key = key.public_key
cert.sign key, OpenSSL::Digest::SHA1.new
open 'private_key.pem', 'w' do |io| io.write key.to_pem end
open 'public_cert.pem', 'w' do |io| io.write cert.to_pem end
You can load this certificate just like the key as described above:
public_cert = File.expand_path '../../../test/public_cert.pem', __FILE__
public_cert = File.read public_cert
PUBLIC_CERT = OpenSSL::X509::Certificate.new public_cert
Replace your test helpers with reusable API
drbrain |
test/test_helper.rb is a great idea Rails brought to the Ruby world as a place for functionality that helps you write better tests. There's now a standard place for you to implement common setup/teardown, shortcuts and custom assertions. However, a test helper is not the best place to store this functionality for a Ruby library.
One of the benefits you get out of writing tests is knowing where your API is clumsy and inadequate. If you have a test helper file full of methods to make your library easy to use in a test why is that not part of your library's API? Wouldn't your users also want a bunch of methods that make your library easier to use in their applications?
For example, in RubyGems we have a test helper that does this: gem_file = Gem::Builder.new(spec).build which is a little silly. Every time you create a Gem::Builder you want to build a gem. You don't create a Gem::Builder object for fun! To help out RubyGems users I added a new method: gem_file = Gem::Builder.build spec which immediately creates and builds them gem which is much nicer for everyone (but really, you should use Gem::PackageTask when building gems).
Whether you're writing a library or a Rails app, this kind of functionality belongs in your library (or application) code, not in the test helper where only your tests can benefit from it.
Even after you improve your API by moving helpful functionality back into your library there's still going to be some things that only make sense for tests. For example, you probably don't want to type t = Some::Deeply::Namespaced::Thing.new 1, 2, 3 many, many times in your tests, so you write a short wrapper method you can call like this: t = thing 1, 2, 3. Your tests may need setup and teardown to maintain a clean environment between tests, custom assertions for readability or you may want to include a pre-built stub or mock.
While this having this functionality in a test helper is fine for a Rails app, it shouldn't go in a library's test helper. When you keep testing functionality hidden in the test directory a user who wants to write a third-party extension for your gem can't access them. Why force a happy user to re-implement (possibly poorly or incorrectly) the work you've done to have nice, clean tests that are easy to read and write?
Instead of having a private test helper I have a public test case like MyGem::TestCase that lives in lib/my_gem/test_case.rb. This gives anyone who wants to extend my libraries a documented, ready-to-go API for writing tests for their extension.
My gem-specific test case typically contains all the requires needed to load the library (ideally require 'my_gem'), proper setup and teardown to sandbox the tests, any utility methods that don't belong in the library itself and possibly some custom assertions. This makes a brand new test easy to start:
require 'my_gem/test_case'
class TestMyGemSomeClass < MyGem::TestCase
def setup
super
# …
end
def test_something
# …
end
end
There is the minor downside that an extension writer must use minitest (my preferred testing library) to test their extension. Perhaps this inconvenience could be solved by a module providing setup, teardown and shortcuts that is included in the proper place for the extension writer's favorite testing library.
PS: Actually, Gem::Builder.build is Gem::Package.build since Gem::Format, Gem::Builder and Gem::Package are getting merged into one convenient class that deals with reading and writing gem files for RubyGems 2.0. This means there will only be one place to look for the API of messing with packages and it reduces the implementation of Gem::Installer a bit.
Packaging from a gemspec is not the Best Way
drbrain |
Or, why you should use Hoe (or a tool like it) to package your gems.
A while back Yehuda wrote Using .gemspecs as Intended and asserted that packaging directly from a .gemspec file was both the way you were supposed to build gems and the best way to build gems.
This is not true.
Rake::GemPackageTask (now Gem::PackageTask) was added to rake less than two weeks after the first commit on RubyGems. Which points to the recognition of a need to have a better way to describe and build a gem than just a file.
Using rake allows you to do more than just package a static list of files in a clean and reusable manner. You can write tasks to generate files that are then packaged in your gem at build time. Hoe builds atop Rake to support many types of generated files easily through task dependencies.
For example, generating a .gemtest file to work with GemTesters, integrating with rake-compiler for pre-compiled gems or generating parsers.
I can hear you thinking "you might have a point there, but I don't generate any files, so how does this apply to me?" Well, what if you wanted to include generated man pages in your gem using binman?
In the world of packaging directly from a gemspec the way to automatically generate files is to violate RubyGems' internals by overriding Gem::Specification#initialize. I can't blame Suraj Kurapati for this, he's working within the framework of the popular thing to do and this is the only way he could implement it.
(Hoe, on the other hand, handles such pre-package dependencies through a set of pre-defined rake tasks that you can add dependencies too. If you need to generate a file you make the package task dependent on it. See the examples above.)
How can we make this better?
First of all, stop using gemspecs as the One True Description of your gem. Let rake generate it. While Hoe is obviously the best way to do this, it doesn't support generating the gemspec by default. You can use one of the gemspec plugins for that. I'm sure there are other build tools have the equivalent built right in.
Second, if you've writing a tool that generates files that will be packaged into a gem, provide a rake task like the one that ships with binman that build tool authors can hook to make packaging seamless.
marshal-structure 1.0
drbrain |
This gem is part of #rbxday, but instead of contributing to Rubinius, it mostly takes from Rubinius!
marshal-structure dumps a tree based on the Marshal format. It supports the Marshal 4.8 format.
INSTALL
gem install marshal-structure
SYNOPSIS
From the command line:
ruby -rpp -rmarshal-structure \ -e 'pp Marshal::Structure.load Marshal.dump "hello"'
Fancier usage:
require 'pp' require 'marshal-structure' ms = Marshal::Structure.new Marshal.dump %w[hello world] # print the Marshal stream structure pp ms.construct # print ruby objects in Marshal stream pp ms.objects
EXAMPLE
str =
"\004\b{\006:\006a[\031c\006Bm\006C\"\006d/\006e\000i\006" \
"f\0322.2999999999999998\000ff" \
"l+\n\000\000\000\000\000\000\000\000\001\0000TF}\000i\000" \
"S:\006S\006:\006fi\000o:\vObject\000@\017" \
"U:\006M\"\021marshal_dump" \
"Iu:\006U\n_dump\006" \
":\026@ivar_on_dump_str\"\036value on ivar on dump str" \
";\000e:\006Eo;\b\000" \
"I\"\025string with ivar\006:\v@value\"\017some value" \
"C:\016BenString\"\000"
structure = Marshal::Structure.load str
pp structure
Prints:
[:hash, 0, 1, [:symbol, 0, "a"], [:array, 1, 20, [:class, 2, "B"], [:module, 3, "C"], [:string, 4, "d"], [:regexp, 5, "e", 0], [:fixnum, 1], [:float, 6, "2.2999999999999998\u0000ff"], [:bignum, 7, 1, 10, 18446744073709551616], :nil, :true, :false, [:hash_default, 8, 0, [:fixnum, 0]], [:struct, 9, [:symbol, 1, "S"], 1, [:symbol, 2, "f"], [:fixnum, 0]], [:object, 10, [:symbol, 3, "Object"], [0]], [:link, 10], [:user_marshal, 11, [:symbol, 4, "M"], [:string, 12, "marshal_dump"]], [:instance_variables, [:user_defined, 13, [:symbol, 5, "U"], "_dump"], 1, [:symbol, 6, "@ivar_on_dump_str"], [:string, 14, "value on ivar on dump str"]], [:symbol_link, 0], [:extended, [:symbol, 7, "E"], [:object, 15, [:symbol_link, 3], [0]]], [:instance_variables, [:string, 16, "string with ivar"], 1, [:symbol, 8, "@value"], [:string, 17, "some value"]], [:user_class, [:symbol, 9, "BenString"], [:string, 18, ""]]]]
Ruby 1.9.3 preview 1
drbrain |
Ruby 1.9.3 preview 1 has been released! Please install it and report bugs! The NEWS file contains the changes since Ruby 1.9.2 and the ChangeLog contains all the gory details.
If you're using RVM you can follow these instructions to install preview 1. If not, you can download the tarball, unpack it, then run configure; make; make install as you like.
Bug Reports
Ruby 1.9.3 preview 1 ships with RDoc, RubyGems and Rake. I can fix critical bugs in all of these before the final release. Please file bug reports for Ruby via redmine for any issue you find. You can also search for existing issues to avoid duplicate reports.
RDoc
The preview 1 release contains RDoc 3.8, but I've released RDoc 3.9.1 to fix a few additional bugs. Please install RDoc 3.9.1 atop the preview 1 release, or standalone on your main ruby. I prefer reports of RDoc issues via github.
RubyGems
The preview 1 release contains RubyGems 1.8.6.1 which is slightly newer than RubyGems 1.8.6. RubyGems 1.8.7 will be released shortly with the combined changes. I prefer reports of RubyGems issues via github.
Rake
The preview 1 release contains Rake 0.9.2.1 which is slightly newer than Rake 0.9.2. I prefer reports of Rake issues via github.

