Linux fileserver storage pool

Is it possible to get linux server to share several hard drives as one storage pool through Samba? So that when I use the share I don't have to worry about saving to a certain drive that has free space, but rather I save to the pool that takes care of all that.

I guess I could do this through some kind of (software) raid, but a lot of my files don't really need to be raided. My aim is to have a setup where I have ~500GB raided and a few TB unraided space.

I currently have a light W2008 server and would like to move to linux and this would be really nice feature to have.


Solution 1:

On Linux, you could use LVM to gather several hard drives (PV) into one Volume Group (VG) and partition it with the Logical Volumes (LV) you wish to share through samba.

See this link for more info.

LVM Schema

Solution 2:

I would second Raphink's suggestion of LVM (and upvote it, in fact) - this is pretty much exactly what LVM is designed for and it works well in my experience.

One thing to note is that using LVM over a bunch of drives is only a little safer than RAID0 - if one drive does down you may lose more than one drive worth of filesystems due to logical volumes straddling drive boundaries. So while "a lot of my files don't really need to be raided", make sure you have a good backup plan for those files that are not so easy to replace.

If you the storage server is likely to see a lot of activity, you might consider using RAID0 as well as LVM. This will give you a significant performance boost for many I/O patterns, and will not reduce total storage space available (as RAID0 offers no redundancy, so uses no space for mirrored data or parity blocks). Once you have tied some drives together in a RAID0 array, you can make the array an LVM PV just like any other drive/partition and use logical volumes to partition out the space as needed.

Of course with RAID0 you pretty much definitely lose everything if a drive dies, but if you have a backup plan that is sufficient for JBOD (which is what you are doing with just LVM, it is equivalent to what some RAID references call "linear mode") you have one that is sufficient for data on RAID0 (and anyone who says not isn't paranoid enough about the data they have over multiple drive using LVM!).

You can mix and match RAID levels on the same drives. On one of the machines under my control that runs as a VM host for development and testing, the drives in its array have some parts as RAID0 (for the VMs themselves) and some as RAID1 (for backups of the VMs). Each of the drives is split into ~100Gb partitions and each set or partitions can be a RAID array. The first two on each drive are currently RAID0 (linked as one logical volume by LVM), the last three RAID1 (again linked as one LV), and the ones left in the middle are free to be either when more space is needed. If you do this you of course need to be wary of I/O contention (it isn't a problem in my case as the two RAID arrays rarely see noticeable activity at the same time because the RAID1 set is only ever acces when making a new backup or restoring an old one) and excess head movements when both arrays are in active use (as they are at opposite ends of the drives), but you should be aware of those for just using LVM without any RAID anyway. This arrangement meant that I didn't have to accurately guess, at install time, what balance of how much high-speed (R0) space and high safety (R1) space the machine would need X months down the line without compromising and just going for a single RAID10 array.

Solution 3:

I would also recommend taking a look at ZFS which also contains drive pooling and RAID mirroring built in.

There are some issues with licensing making it a little awkward to implement but I have been running it on my home system with 3 x 1TB drives with no issues.