Block-level deduplication on Linux

Solution 1:

Check lessFS, data-deduplication filesystem, for Linux. It is still in beta but you can try it out:

http://www.lessfs.com/

Regards,

MV

Solution 2:

Deduplication is coming to ZFS on OpenSolaris but that functionality is not currently available.

It was prototyped by Jeff Bonwick and Bill Moore this past winter and they are working on integrating it this summer. So it should be available in the next release of OpenSolaris or sooner if you want to play around with the development branch.

Solution 3:

For people who may be unfamiliar with data deduplication, it is a technique whereby data is analyzed at the file (or block, I suppose) level, and where identical files/blocks throughout the file system are replaced with a smaller token. This has the effect of greatly shrinking the effective size on disk. It could be considered a form of copy-on-write. Read the wiki page on it.

There is no filesystem that I have heard of in Linux to do dedup, file or block level. Such a beast would be handy, although pretty processor intensive.