Two mirrored database servers, or one twice as powerful
I'm looking at setting up a seperate database server for one of my applications, the application currently runs on 3 servers; 1 frontend server and 2 application/database servers. But for performance reasons I'm looking to move the database onto it's own machine(s).
So it comes down to this, am I better off getting two servers and setting them up as replicas of each other. Or just one server but make it twice as powerful?
Things I've thought of so far;
With one server:
- Easyer to manage
- More CPU/Memory/Disk avaliable to the database as the OS will only take up one lot of resorces
- More storage space (If the DB was replicated I'd need 2x the amount of disk storage to store the same data)
- Cheaper (With rented servers only a little bit, but I imagine more so on purchased servers)
With two servers:
- Better redundancy (Kernel Panic/Disk Failure/OS Failure/Network Failure and the database as a whole would still work, but probably slow down a bit)
- Less disk/network IO per server (Would this make it faster?)
Some technical details specific to my situation;
The servers would be run on a cloud service called StormOnDemand so we wouldn't really have too many issues with hardware reliability. We could also scale vertically up and down as needed.
We use 4 different database systems; Postgresql, MongoDB, Redis and Memcached*
Last time I checked, we average just over 2mill transactions a day on Postgres, about 2.5mill on Mongo, not sure about Redis and Memcached but we use them both for caching so I would imagine it's about the same.
The servers are all connected via a 1GB/s local network.
The databases are, I would say, about medium size (Postgresql: 32GB, MongoDB: 16GB, Redis and Memcached both use memory for storage but I think they currently run about Redis:12GB and 4GB)
The servers I would be looking at getting would ether be a 8CPU and 30GB RAM for a single server, or 4CPU and 8GB RAM for two servers
For redundency, it's a web app and the vast majority of our visitors are reccuring so, although we don't want downtime, we'd be happy with a couple of hours a month for the cost savings we'd get from a single server. In 6 months running with SOD we've never had any downtime but that's not to say it won't happen in the future.
If I've excluded any details that would be useful I'm happy to provide them but I've tried to include as much as possible.
*Memcached would continue to run on the application servers, we use it as a sort of 'local cache' as such it's not replicated/sharded across servers, using Redis as a distributed cache
Solution 1:
Typically databases benefit from big iron. As a general rule of thumb, adding extra boxes helps performance for web and application servers - but for databases the answer is usually to get a bigger, faster machine.
But the flip side to this is that having more equivalent nodes always improves availability as well as performance - consider if you go spend $3000 on a shiny server it might have a probablity of failing within it's lifecycle of (say) 1%. Or you might buy 2 basic spec machines with probability of failure of 5%. Are you better spending your money on the shiny box? If you look at the probability of BOTH basic machines failing it's a lot less - 0.05*0.05*100=0.25% i.e. 4 times as reliable.
But an awful lot depends on the DBMS you are using. NoSQL databases are designed to scale across lots of servers. MySQL replication is relatively reliable and efficient. But both mysql and nosql databases tend not to replicate in real time. If you need consistency across the cluster then you need something like Oracle RAC for multiple nodes - and IME, it generates huge volumes of inter-node chatter - in this scenario you'd be better off buying the bigger box.
So in your case, mongoDB and Redis would definitely favour the multi-node solution. My knowledge of PostGreSQL clustering is limited, but given its support for consistency in queries I'd expect that it would behave more like Oracle (i.e. inflated amounts of local disk and network I/O).
I would have said that your datasets are on the small side - indeed, I'm rather alarmed that you use 4 different substrates for managing your data - and would strongly recommend consolidating them
Solution 2:
I would have to say whatever you think you can get away with that has the least risk of data loss. You talk of using cloud storage and with that it would be more practical, in my opinion for a single server. You also mention cost a few times so I take it money is hard to come by and it is much easier to increase the specs of a single system than to get an exact match of another, less powerful system. Always be asking yourself "Do I really need this"? and you can never go wrong.