Solution 1:

Hmm, I'm not sure I would want to mix two so different volumes in a single RAID1. If you do that half of your requests will be served from the slower EBS and half from the faster instance storage and that may lead to quite an unpredictable performance. I would look at standard tools to achieve a better performance.

Look at Provisioned IOPS EBS disks (if you need high random-access IO) or Throughput optimised EBS (if you're sequentially reading large files). They may provide the performance you need out of the box. The pricing is here.

You should also look at some caching, especially as it's mostly read-only contents as you say. Every time the file is needed you can have a look in the local cache dir on the ephemeral storage and if it's there serve it from there. If not take it from EBS and save a copy in the cache. Especially if it's all read only it should be quite a simple caching layer.

Or if the files on EBS are database files (which I suspect may be the case) cache the results of your queries or processing in Memcache or Redis or in the database native cache (e.g. MySQL Query Cache).

Hope that helps :)

Solution 2:

40GB is small enough for RAM Disks, which will be faster than scratch disks. How long will your app run, and is it worth paying for an instance with larger memory allocation?

24x7 may be too costly. But 40GB is within reach.

As a bonus you should enjoy more cores.

I agree with Query Caching for deterministic queries, and any sort of buffering will help over time.

Solution 3:

I... wouldn't use a RAID1 volume, even with --write-mostly. The performance degradation while the set rebuilds is going to get annoying.

What I would recommend looking into instead is bcache. I've found it to be very useful in situations where I've got access to SSDs, but also have a very large amount of data to store (usually very large PostgreSQL databases) for which it isn't cost-effective to purchase all SSDs. I've only used it in "persistent" mode, where it uses the SSDs as a write-back cache, but it does have a mode where the cache storage layer is treated as ephemeral, and no writes are considered complete until they're on the underlying permanent storage.