Which filesystem for large LVM of disks (8 TB)?

I have a Linux server with many 2 TB disks, all currently in a LVM resulting in about 10 TB of space. I use all this space on an ext4 partition, and currently have about 8,8 TB of data.

Problem is, I often get errors on my disks, and even if I replace (that is to say, I copy the old disk to a new one with dd then i put the new one in the server) them as soon as errors appear, I often get about 100 MB of corrupted data on it. That makes e2fsck go crazy everytime, and it often takes a week to get the ext4 filesystem in a sane state again.

So the question is : What would you recommend me to use as a filesystem on my LVM ? Or what would you recommend me to do instead (I don't really need the LVM) ?

Profile of my filesystem :

  • many folder of different total sizes (some totalling 2 TB, some totalling 100 MB)
  • almost 200,000 files with different sizes (3/4 of them about 10 MB, 1/4 between 100 MB and 4 GB; I can't currently get more statistics on files as my ext4 partition is completely wrecked up for some days)
  • many reads but few writes
  • and I need fault tolerance (I stopped using mdadm RAID because it doesn't like having ONE error on the whole disk, and I sometimes have failing disks, that I replace as soon as I can, but that means I can get corrupted data on my filesystem)

The major problem are failing disks; I can lose some files, but I can't afford lose everything at the same time.

If I continue to use ext4, I heard that I should best try to make smaller filesystems and "merge" them somehow, but I don't know how.

I heard btrfs would be nice, but I can't find any clue as to how it manages losing a part of a disk (or a whole disk), when data is NOT replicated (mkfs.btrfs -d single ?).

Any advice on the question will be welcome, thanks in advance !


Solution 1:

It's not file system problem, it's disks' physical limitations. Here's some data:

SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14. That means that 1 byte per 12TB will be unrecoverably lost even if disks work fine.

This means that with no RAID you will lose data even if no drive fails - RAID is your only option.

If you choose RAID5 (total capacity n-1, where n = number of disks) it's still not enough. With 10TB RAID5 consisting of 6 x 2TB HDD you will have a 20% chance of one drive failure per year and with a single disk failing, due to URE you'll have 50% chance of successfully rebuilding RAID5 and recovering 100% of your data.

Basically with the high capacity of disks and relatively high URE you need RAID6 to be secure even again single disk failure.

Read this: http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

Solution 2:

Do yourself a favor and use a RAID for your disks, could even be software RAID with mdadm. Also think about why you "often get errors on your disks" - this is not normal except when you use cheap desktop class SATA drives instead of RAID grade disks.

After that, the filesystem is not that important anymore - ext4, xfs are both fine choices.

Solution 3:

I've had good luck with ZFS, you could check to see if it's available on whatever distro you use. Fair warning, it'll probably mean rebuilding your whole system, but it gives really good performance and fault-tolerance.