Is there a Linux utility that can distribute files to an array of hard disks, like a load balancer, but for storage?

I have 2 1TB disks not in any RAID configuration. I'd like files I need to store to be placed on one of the disks depending on the capacity of the disks, and when accessing the file I suppose I'd need to find the file via a database containing a file map, or by using a hash. Are there any Linux utilities that provide this, or should I just create a PHP script?

Thanks


Considering how cheap a 1TB disk is, get another and create a RAID5.. Redundancy and storage.


Three years late, but still relevant. I had the same problem and FakeRaid was simply out of the question. Use AUFS. It will join the drives under a single drive. The mfs setting will put new files on the drive with the most free space. There is also rr which is round robin and pmfs which will put files onto the drive that has the folder already and has the most free space. I personally use pmfs. My setup works like so.

The fstab:

# The Archive
LABEL=Archive_000 /mnt/Archive_000 ext4 defaults 0 0
LABEL=Archive_001 /mnt/Archive_001 ext4 defaults 0 0
LABEL=Archive_002 /mnt/Archive_002 ext4 defaults 0 0
LABEL=Archive_003 /mnt/Archive_003 ext4 defaults 0 0
LABEL=Archive_004 /mnt/Archive_004 ext4 defaults 0 0
LABEL=Archive_005 /mnt/Archive_005 ext4 defaults 0 0
LABEL=Archive_006 /mnt/Archive_006 ext4 defaults 0 0
LABEL=Archive_007 /mnt/Archive_007 ext4 defaults 0 0
LABEL=Parity      /mnt/Parity      ext4 defaults 0 0
LABEL=Q-Parity    /mnt/Q-Parity    ext4 defaults 0 0

I added the init.d script (Due to drive mount times being too slow to keep up with aufs mount):

d0="Archive_000"
d1="Archive_001"
d2="Archive_002"
d3="Archive_003"
d4="Archive_004"
d5="Archive_005"
d6="Archive_006"
d7="Archive_007"

mount -t aufs -o noxino -o br=$d0=rw:$d1=rw:$d2=rw:$d3=rw:$d4=rw:$d5=rw:$d6=rw:$d7=rw -o create=mfs -o sum none Archive

That gives me 10 mounts under /mnt. I like it this way because I use SnapRAID which you'll have to download and compile (there are guides for it). I use this on a Samba server, so the only thing everyone else sees is just the Archive folder. Make sure to make the directory other wise you'll get a mount error.


Greyhole will distribute your files across multiple drives. It will also allow you to specify redundancy, so that certain files have redundant copies stored on multiple drives. It is targeted at the home server or workstation and not as a production enterprise solution.


It sounds like all you care about is being able to utilize all 2TB of storage without having to manually place files on one drive or another. Either LVM or RAID0 can solve this problem for you at the expense of increased risk of failure. For LVM, you would make each 1TB drive an LVM physical volume and put them both in a single volume group. After that you could create logical volumes that up to 2TB in size. For RAID0, you'd just create the RAID device.

# pretending your unused 1TB disks are sdy and sdz
# for LVM
pvcreate /dev/sdy /dev/sdz
vgcreate myvg /dev/sdy /dev/sdz
lvcreate --name mylv --size 100%
# for RAID0
mdadm --create /dev/md0 --raid-devices 2 --level 0 /dev/sdy /dev/sdz

I don't know of a way to transparently merge separate filesystems into a single storage pool. This sort of sharding isn't uncommon, it's just typically implemented at the application rather than the storage layer. Engineyard has a paper describing filesystem sharding tactics and processes.