Solution 1:

Deduplication is where you look at the content of a data set, note all the duplicate bits that are present, and store the data just once, replacing all those otherwise copies of data with a pointer back to the one copy. It is particularly helpful with backups because when you back up things like servers so much of the data is the same. Imagine, for instance, you are backing up 1,000 Windows servers - much of the content on those boxes will be identical.

Deduplication is so popular today for 3 reasons:

  1. Lately everyone is obsessed with building disaster recovery solutions that utilize off-site servers. To do this, you have to replicate a lot of production data to the remote site and bandwidth is a huge problem. Any reduction in the amount of data you have to replicate helps a lot.

  2. The amount of data companies are retaining is exploding - thanks to cheaper storage and multi-industry requirements for retention of records.

  3. The technology relatively recently hit the sweet spot. We've had things like deduplication for a long time (single instance storage, etc) which has helped but only in the last year or so have we seen real deduplication that can significantly reduce the amount of storage hit the mainstream.