"tar czf" versus "tar cf - | gzip": are they different? (or how to improve a backup)

I want to speed up my backup done with tar czf, the common way to do it. But day by day my backed up files grow so it becomes slower.

I was thinking to take advantage of the several cores available in my server and I was wondering if there is any difference between doing the backup with tar czf or piping tar to gzip: tar cf - | gzip

I guess that there isn't any difference, because the first spawns two processes (tar and gzip), in a similar way like piping it.

If there is not difference, do you know any good alternative to do this, without going incremental? I'm looking at pigz too and it looks fine.


Solution 1:

When you say you want to take advantage of multiple cores the implication is that your tar with gzip is CPU bound and not IO bound, are you sure this is the case? If you are not sure you need to run sar, iostat, top, or check monitoring graphs etc to find out. Never a good idea to try to solve a problem with out understanding it first. Not saying this is the case with you for sure, but my guess would be that even though there is compression with gzip you would be more likely to be IO bound.

If it is IO bound, and you have multiple arrays, a separate process for each array might make sense.

I also second David's advice to consider incremental.

Solution 2:

You're unlikely to improve on the raw performance of tar and gzip by fiddling like this; in order to take better advantage of the hardware you could separate out folders into different parts and do multiple archives simultaneously.

Why do you not want to go incremental? I would recommend using rsnapshot even if you're doing this locally as it has the capability to use hard links to let you save disk space whilst still keeping exact copies from multiple times