What should I rely on lbzip2 or pbzip2?

As bzip2 claims to compress best (in size), I decided to use it. The working server can offer 24 (virtuals) CPUs (4 real X5650 @ 2.67GHz) - and thus I decided to look for parallel variants.
Using debian stable - sorry, but I found best matches here in askubuntu - I decided to take a closer look at pbzip2and lbzip2.
But what to select? In actual stable pbzip2 is in version 1.1.1-1and lbzip2 in version 0.23-1. That might cosmetically tend to pbzip2 - but lbzip2 says it is even on single-core computers faster. On the other hand pbzip2 claims to be completely compatible with bzip2 v1.0.2.
Additionally I have some timing-values of a big local job:
Using lbzip2

Command being timed: "tar -cjf /tmp/mapleTAsicherung.lbzip2.tar /bin /etc /lib /lib32 /opt /sbin /selinux /usr"
    User time (seconds): 2134.32
    System time (seconds): 39.24
    Percent of CPU this job got: 2099%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 1:43.51
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 1509088
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 1054467
    Voluntary context switches: 153901
    Involuntary context switches: 235285
    Swaps: 0
    File system inputs: 0
    File system outputs: 3460632
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0  

Using pbzip2

    Command being timed: "tar -cjf /tmp/mapleTAsicherung.pbzip2.tar /bin /etc /lib /lib32 /opt /sbin /selinux /usr"
    User time (seconds): 3158.18
    System time (seconds): 59.80
    Percent of CPU this job got: 2095%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 2:33.56
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 1436320
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 477683
    Voluntary context switches: 151326
    Involuntary context switches: 339246
    Swaps: 0
    File system inputs: 0
    File system outputs: 3460536
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

What should one use? What are the major differences? At the moment I tend towards lbzip2.


Solution 1:

Here's a basic idea how to evaluate them.

Take a big tarball of the kind you usually work with. Compress it with bzip2, pbzip2, lbzip2. Measure the (wall clock) times and save all the outputs in different files. This will give you three times and three file sizes.

Then iterate over all three output files (ie. the compression outputs of bzip2, pbzip2, lbzip2), and decompress each with all three utilities (bzip2, pbzip2, and lbzip2). This will give you further nine times.

Re-run the twelve tests under some profiler and get a peak memory usage (virtual and RSS) for each. Again, this will yield 12 values. (If your Linux is configured to no overcommit, then you're interested in VSZ. Otherwise you care about RSS).

Make a table with 12 rows for these data points -- col1: 3 compressed sizes, col2: 3 compression times / 9 decompression times, col3: 12 peak mems -- and choose what suits you best. You should factor in how often you compress vs. how often you decompress.

I use lbzip2-0.23, but I wrote it, so it doesn't count.

Finally, no matter which one proves best for you, always save a checksum of the uncompressed tarball, plus verify your saved file before declaring the backup "done".

FILES=...
OUTDIR=/mnt/archive
BZ2_UTIL=...

(
  tar -c -- $FILES \
  | tee >(sha256sum >"$OUTDIR"/myfiles.tar.sha256) \
  | pv -c -N plain 2>/dev/tty \
  | "$BZ2_UTIL" \
  | pv -c -N compr 2>/dev/tty \
  > "$OUTDIR"/myfiles.tar.bz2
) 2>"$OUTDIR"/myfiles.err

"$BZ2_UTIL" -dc -- "$OUTDIR"/myfiles.tar.bz2 \
| sha256sum -c -- "$OUTDIR"/myfiles.tar.sha256

Solution 2:

I did some comparison benchmarks for bzip2 vs pbzip2 and lbzip2 along with lzip and plzip at http://vbtechsupport.com/1614/. I'm liking the speed improvments for lbzip2 as long as you have enough memory that is.