does adding heaps of drives to a raid 0 increase performance?

Solution 1:

In theory yes, more drives in a raid0 would lead to higher performance because the load is shared over more drives. However in practice you would be limited by the the bandwidth of the raid controller, the CPU and memory performance and similar. The performance increase would not be linear, that is 4 disks is not exactly twice as fast as 2 disks.

In any reasonably modern system with a raid controller, or even using a software raid with linux' mdadm, using 8 drives will be faster than using 2 and you should not be held back by the rest of the system's performance. The CPU, raid and/or disk controller, memory, it all should be able to handle it. You may see increased use of system resources the more drives you add. Especially if you use the onboard SATA controller in a softraid combination. But nothing that would really hinder overall usability. If using linux you may want to use a kernel that has been configured without "preempt" in order for server oriented tasks to get preference over user responsiveness.

https://rt.wiki.kernel.org/index.php/RT_PREEMPT_HOWTO

Of course the more drives you add, the higher the chance one of them fails and your whole raid is destroyed. I would expect a raid0 of 8 drives to not last more than a year or two, if you're lucky. A raid0 of 16 drives would be asking for trouble and then I'd consider a raid10, it would still be fast enough and you have less to worry about.

As for how many drives would max out a system's resources I wouldn't know unless I had detailed system specs. I think you'd be limited more by the failure rate, if you go over about 16 disks (I rather not like to think about it).

Naturally you'd only use the raid0 for data that can be lost any time without problems. It would work great for things such as a build server, or scratch space for large scientific computations. In fact those scenarios is what I often used a raid0 for and it is a great way to squeeze a bit more life out of a bunch of older, lower capacity and inexpensive disks that would otherwise have been collecting dust. You can even mix sizes, at least with mdadm.

If using mdadm it may be worth considering to just use a raid10 as in certain configurations it can get near the performance of a raid0, that is read performance of a raid0 and already improved write performance over other raid levels (except raid0). You would get better redundancy than other raid levels, with only a slight speed penalty compared to a raid0. That would be the best of both worlds, you don't find that often.

https://en.wikipedia.org/wiki/RAID#Non-standard_levels

Linux MD RAID 10 provides a general RAID driver that in its "near" layout defaults to a standard RAID 1 with two drives, and a standard RAID 1+0 with four drives; though, it can include any number of drives, including odd numbers. With its "far" layout, MD RAID 10 can run both striped and mirrored, even with only two drives in f2 layout; this runs mirroring with striped reads, giving the read performance of RAID 0. Regular RAID 1, as provided by Linux software RAID, does not stripe reads, but can perform reads in parallel.

As suggested in the comments, mixing sizes with mdadm will not give a speed increase if you utilise all disk space as opposed to letting the smallest disk define the size of the array.

Also seek time will not improve in a raid0 and can even become a bit slower. For an SSD based raid0 the seek time would be so small (between 0.08 and 0.16 ms https://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics#cite_note-HP_SSD-6) it wouldn't matter much I expect.

Solution 2:

It depends from the workload, but IMHO yes adding 2 additional disks to existing 2 disks array should give better overall performance.

You need to realize where the bottlenecks are:

  • CPU - how much data flow CPU can handle,
  • bus/controller - how much data it can carry,
  • SSD/HDD - how much data it can give/take.

Let's assume that there is a Linux software RAID, then adding two additional disks MAY result in:

  • ~ two times shorter access time to big enough block of data, which results in;
  • ~ double IOPS,
  • ~ double throughput, assuming that the controller has sufficient bus and CPU can handle the traffic.

*~ this is never two time boost in the following factors, always less 10-20%. It looks like more or less linear. Please don't treat it as an authoritative answer, I didn't do any studies about it.