What should I rely on lbzip2 or pbzip2?
As bzip2 claims to compress best (in size), I decided to use it. The working server can offer 24 (virtuals) CPUs (4 real X5650 @ 2.67GHz) - and thus I decided to look for parallel variants.
Using debian stable
- sorry, but I found best matches here in askubuntu - I decided to take a closer look at pbzip2
and lbzip2
.
But what to select? In actual stable pbzip2
is in version 1.1.1-1
and lbzip2
in version 0.23-1
. That might cosmetically tend to pbzip2
- but lbzip2
says it is even on single-core computers faster. On the other hand pbzip2
claims to be completely compatible with bzip2 v1.0.2
.
Additionally I have some timing-values of a big local job:
Using lbzip2
Command being timed: "tar -cjf /tmp/mapleTAsicherung.lbzip2.tar /bin /etc /lib /lib32 /opt /sbin /selinux /usr"
User time (seconds): 2134.32
System time (seconds): 39.24
Percent of CPU this job got: 2099%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:43.51
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1509088
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 1054467
Voluntary context switches: 153901
Involuntary context switches: 235285
Swaps: 0
File system inputs: 0
File system outputs: 3460632
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
Using pbzip2
Command being timed: "tar -cjf /tmp/mapleTAsicherung.pbzip2.tar /bin /etc /lib /lib32 /opt /sbin /selinux /usr"
User time (seconds): 3158.18
System time (seconds): 59.80
Percent of CPU this job got: 2095%
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:33.56
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 1436320
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 0
Minor (reclaiming a frame) page faults: 477683
Voluntary context switches: 151326
Involuntary context switches: 339246
Swaps: 0
File system inputs: 0
File system outputs: 3460536
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
What should one use? What are the major differences? At the moment I tend towards lbzip2
.
Solution 1:
Here's a basic idea how to evaluate them.
Take a big tarball of the kind you usually work with. Compress it with bzip2, pbzip2, lbzip2. Measure the (wall clock) times and save all the outputs in different files. This will give you three times and three file sizes.
Then iterate over all three output files (ie. the compression outputs of bzip2, pbzip2, lbzip2), and decompress each with all three utilities (bzip2, pbzip2, and lbzip2). This will give you further nine times.
Re-run the twelve tests under some profiler and get a peak memory usage (virtual and RSS) for each. Again, this will yield 12 values. (If your Linux is configured to no overcommit, then you're interested in VSZ. Otherwise you care about RSS).
Make a table with 12 rows for these data points -- col1: 3 compressed sizes, col2: 3 compression times / 9 decompression times, col3: 12 peak mems -- and choose what suits you best. You should factor in how often you compress vs. how often you decompress.
I use lbzip2-0.23, but I wrote it, so it doesn't count.
Finally, no matter which one proves best for you, always save a checksum of the uncompressed tarball, plus verify your saved file before declaring the backup "done".
FILES=...
OUTDIR=/mnt/archive
BZ2_UTIL=...
(
tar -c -- $FILES \
| tee >(sha256sum >"$OUTDIR"/myfiles.tar.sha256) \
| pv -c -N plain 2>/dev/tty \
| "$BZ2_UTIL" \
| pv -c -N compr 2>/dev/tty \
> "$OUTDIR"/myfiles.tar.bz2
) 2>"$OUTDIR"/myfiles.err
"$BZ2_UTIL" -dc -- "$OUTDIR"/myfiles.tar.bz2 \
| sha256sum -c -- "$OUTDIR"/myfiles.tar.sha256
Solution 2:
I did some comparison benchmarks for bzip2 vs pbzip2 and lbzip2 along with lzip and plzip at http://vbtechsupport.com/1614/. I'm liking the speed improvments for lbzip2 as long as you have enough memory that is.