100 terabytes database on PostgreSQL without sharding

Is it realistic to setup a 100 TB database (about 90 TB actually) on PostgreSQL without data sharding between a number of nodes? Are there any success stories / examples about similar setups?


50K writes per second that need to be absorbed is more than a challenge usually. Even in synthetic benchmarks with quite simple inserts, PostgreSQL's limits tend to max out around roughly 10 K/s - and there you don't even have such a large beast in terms of database size.

Also the I/O system for that single PostgreSQL node is going to be interesting as even with RAID 10 and assuming that 50K inserts are going to be equal to just 50K IOPS (which is probably wrong, but it depends on your database scheme and indices), you are going to need roughly a hundred disks paired with a very good array that saves you from buying several hundred disks to service those writes in a timely manner.

If sharding is easy and you expect such a huge write load then go for sharding. Writes can be very difficult to scale.


It is realistic and will work. Performance largerly depends on how much RAM you have. The larger the RAM, the larger the cache, and the longer PostgreSQL can cache data before offloading to disk.

PostgreSQL will write data to cache, and offload the cache from time to time. So 50k INSERTs per second will not be translated to 50k IOPS. It will be way less, because it will cluster records together and write them all at the same time.

A database that large is not a problem if the majority of the work is INSERT. PostgreSQL will have to change the indexes here and there, but that's really an easy job. If you had lots of SELECTs on a database this size, you would really need to shard.

I once worked on a Oracle DB (Oracle 10g) with 400TB on a 16GB server, one instance only. The database workload was primary INSERTs too, so a few SELECTs per day and millions of INSERTs every day. Performance was far from being a problem.