The arrow of time

Ivan Voras' blog

Using multiple CPUs with gzip

gzip is ofcourse an ubiqutous compression program used in unix-like systems everywhere. Unfortunately, like most old utilities, it is completely sequential and single-threaded, which is a shame in a world where it is now becoming hard to find a desktop CPU without at least 4 CPU cores on it.

There is a quick and dirty hack around it: gzip -2 < bigfile.vmdk | gzip -9 > bigfile.vmdk.2gz

The theory behind the trick is to use two compressors, one fast and one slow; The fast one should be fast enough to make use of available disk bandwidth and the slow one should be fast enough to read through whatever the first one feeds it. This two-stage model should be extensible to more stages / CPUs, but it gets tricky with regards to tuning the speed. Since the decompression stage is so much faster I doubt it will see much multi-CPU usage (of course it still needs two passes).

I found that it sort-of works, but the bottleneck soon becomes available disk bandwidth :)

last pid: 36628;  load averages:  1.48,  0.64,  0.37  up 4+03:31:30  14:34:54
184 processes: 3 running, 181 sleeping
CPU: 31.5% user, 0.0% nice, 2.2% system, 0.3% interrupt, 65.9% idle
Mem: 1130M Active, 1488M Inact, 935M Wired, 52M Cache, 392M Buf, 75M Free
Swap: 4094M Total, 2972K Used, 4091M Free

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
36626 ivoras 1 113 0 8032K 1384K CPU1 1 0:38 81.98% gzip -2
36627 ivoras 1 111 0 8032K 1396K CPU2 2 0:33 69.19% gzip -9

#1 Re: Using multiple CPUs with gzip

Added on 2010-04-23T15:03 by

why don't use /usr/ports/archivers/pigz/ ?

#2 Re: Using multiple CPUs with gzip

Added on 2010-04-23T15:13 by Ivan Voras

For fun :)

#3 Re: Using multiple CPUs with gzip

Added on 2010-04-23T21:13 by Ted Mittelstaedt

There is a SMP gzip that has been out there for a while, see:

http://lemley.net/mgzip.html

#4 Re: Using multiple CPUs with gzip

Added on 2010-04-26T19:36 by DES

Hmm, is bzip2 parallelized? IIRC no information is retained from one block to the next, so you could compress an arbitrary number of blocks in parallel, in any order.

#5 Re: Using multiple CPUs with gzip

Added on 2010-05-04T14:19 by fidaj

4DES /usr/ports/archivers/pbzip2

Post your comment here!

Your name:
Comment title:
Text:
Type "xxx" here:

Comments are subject to moderation and will be deleted if deemed inappropriate. All content is © Ivan Voras. Comments are owned by their authors... who agree to basically surrender all rights by publishing them here :)