I am IT everything man at a small company. I want to design a new infrastructure including a new server and a separate backup server with company wide backup policy.

The most important thing in the company is the SQL Server and its databases. There are 10 database, but only 2 of them are really important. The first one 8GB, mostly text data and numbers. The second one about 300GB with 16GB/month grow containing PDFs and GIFs.

To save the storage current backup policy consists of one full backup per week and 6 differentials. I think its about 350GB per week, 1.4TB per month.

After reading so articles about silent data corruption I decided to try ZFS with Nexenta Community edition.

My question: is ZFS with deduplication good for storing backup files in term of reliability or should i think about some tape backup or something else?

EDIT: I know that right now we cannot predict performance, deduplication ratio etc, but I want to know if it is a good idea at all.


Solution 1:

Certainly ZFS is plenty stable enough to do this kind of thing, there are many very large high-profile and reliable production platforms out there based entirely on ZFS and Nexenta.

That said always like to have on-site disk-based backups such as the one you're suggesting AND removable-disk or tape based backups that go off-site daily to protect against fire/earthquake/Cthulhu etc.

So my answer is yes, it's fine but I'd go for both options if you can.

Solution 2:

(assuming you're referring to using dedupe within ZFS versus your backup software)

I would not recommend using ZFS native deduplication for your backup system unless you design your storage system specifically for it.

Using dedupe in ZFS is extremely RAM intensive. Since the deduplication occurs in real-time as data is streamed/written to the storage pool, there's a table maintained in memory that keeps track of data blocks. This is the DDT table. If your ZFS storage server does not have enough RAM to accommodate this table, performance will suffer tremendously. Nexenta will warn you as the table grows past a certain threshold, but by then, it's too late. This can be augmented by the use of an L2ARC device (read cache), but many early adopters of ZFS fell into this trap.

See:

ZFS - destroying deduplicated zvol or data set stalls the server. How to recover?

ZFS - Impact of L2ARC cache device failure (Nexenta)

When I say that the RAM requirement is high for using dedupe, I'd estimate the RAM and L2ARC needs for the data set you're describing at 64GB+ RAM and 200GB+ L2ARC. That's not a minor investment. Keeping lots of Windows system files and image documents that won't be reread will fill that DDT very quickly. The payoff may not be worth the engineering work that needs to go in upfront.

A better idea is to use compression on the zpool, possibly leveraging the gzip capabilities for the more compressible data types. Deduplication won't be worth it as there's a hit when you need to delete deduplicated data (needs to reference the DDT).

Also, how will you be presenting the storage to your backup software? Which backup software suite will you be using? In Windows environments, I present ZFS as block storage to Backup Exec over iSCSI. I never found the ZFS CIFS features to be robust enough and preferred the advantages of a natively-formatted device.

Also, here's an excellent ZFS resource for design ideas. Things About ZFS That Nobody Told You