How is a data/file stored in a RAID-5

Solution 1:

Traditional RAID are array of disks presented as a single block device (volume). It doesn't knows anything about underlying filesystem and/or files. Same goes other way - filesystems usually doesn't know if there is a RAID behind the storage.

This means that how your files are stored on RAID will heavily depend on how filesystem stores them. So it's better to talk about raw data stored on block device.

Assume you use a raw access to change a single byte on the RAID volume. The underlying RAID subsystem (be it either hardware or software RAID) will detect which stripe are accessed and in basic operation (without taking caching and/or advanced solutions) will have to rewrite whole stripe to update this data.

If in next operation (providing no caching again for simplicity) you will update byte next to previous one and the RAID subsystem will detect it is on the same stripe as previous, it will update the whole stripe again. By simple it's the Read-Modify-Write operation. I.e. it will read all previous contents, change only the required byte, re-calculate the parity block and write back new data again on all 5 drives. In really caching and some advanced checking algorithms will prevent this from happening every time ensuring real hardware writes will occur only when necessary.

So basically in your example with 5 drives in RAID-5 array if you write even the smallest 1 byte file, it still will be written across all of the 4 RAID-5 data drives + 1 parity drive. But it doesn't means it will take more filesystem space that it does if it's stored on a single non-RAID drive. RAID volume will provide exactly the space it does provides (N-1 drives - some small metadata chunk).