How to grow from single server setup
Solution 1:
Buy some hardware, but put it in your test lab not in the data centre. Then stress your application on various hardware/ software combinations until you find a reasonable one which will do what you want.
Of course you'll need to engineer something which can create fake traffic against a production-like database running a test copy of your app. But who said it would be easy.
If you don't do this and simply do some stuff in production, you've no idea whether it's going to work or not, and you may have spent a LOT of engineering effort implementing things like caches (which will come with their fair share of bugs!) on something that doesn't help.
Test, test, and test more. Don't throw hardware/software changes into production until you've got good performance data showing that it is likely to improve matters significantly. Engineering effort is expensive, test hardware is not (particularly).
Memcached is just one option, and you need probably not consider it until you've got the database's caching working optimally. This means putting it on a dedicated (64-bit, of course) box with a reasonable amount of RAM (not 4G - laptops have that nowadays; 32G is definitely affordable) and tuning it appropriately.
You've not mentioned how big your database is, but if it's at all feasible, you will want to try to get it entirely in ram (or at least the hot bits). Getting your database entirely in ram will make read IO operations disappear entirely and hence stop being a bottleneck.
Profile your database queries. There are tools knocking around for doing this - you should be able to simulate production load in your test environment. The trick is to avoid slow queries and ensure that the frequently-executed ones are fast.
If your performance problems are related to IO syncs, because you're just doing too many transactions for the database, ensure that you're using a battery backed raid controller that is behaving right (talk to your vendor about them). They give a lot more IO write operations than a non-battery backed one (because the data only need to get to the cache before the OS gets the confirmation). Alternatively, if your data don't matter that much, consider relaxing the database's durability parameters (innodb sync on commit).
Solution 2:
By looking into caching solutions, as many others have suggested here, you can expect to end up with about 10% of the load you have today, maybe less.
However, this depends on what kind of services you run on your machine. You can do a lot with memcached without very much RAM.
You should try to profile which database queries take the longest, either by using MySQL's slow query log (or the equivalent for your database), or by using a tool such as mytop. Also, MySQL's EXPLAIN SELECT
syntax may be helpful.
Caching the results of a few selected MySQL queries (even for just a short period of time) can really improve your performance a lot.
Solution 3:
I do a lot of performance and scale out work and what I've discovered is that:
Every application load is unique
Generic responses such as add more ram, get another server, do y, try x are often lessons in frustration and leave to convoluted setups.
Measure the Right Things
One of the biggest challenges is in determining what benchmarks are important. This often requires a step back and you have to put yourself in your client's shoes. Sometimes, simplistic site design changes and mean huge benefits to the web visitor. This is why I like tools like YSlow! which focus more on the end-user's experience rather than the server level. Once you decide what the right benchmark is for your site, then you can begin tuning. Benchmarks may be total page load time, total page size, cache effectiveness, site latency, etc. You have to pick the one that makes sense for your application.
Nuts and Bolts
One you are tracking the right benchmarks, start at a very low level. I like to use sysstat. You can get a wealth of information from sysstat and help you tease apart which system may be limiting overall application performance. Generally, I boil performance issues into:
- network stack
- memory stack
- disk io
- application layer
- os layer
Using sysstat and other tools, you can begin to split hairs and find the system that is limiting performance.
For example, I've seen highly loaded servers fail due to how their application was configured. Poor caching, lack of expires headers on static content, using HTTP vs. file includes, etc. all contributed to poor application performance. Fixing these application issues required no hardware changes. In other cases, I've seen the disks maxed out despite tons of caching. Moving to faster disks fixed the issue.
Rinse and Repeat
Often during application tuning, you will uncork one bottleneck to only find another one. This is why I recommend trying to monitor whatever it is you are tuning.
For example, say you fix a disk IO issue but your app is still slow. You may think you have wasted your efforts, but what's happen is you've simple hitting another bottleneck. By monitoring disk IO carefully, you can be sure you are improving disk IO even if your important application performance monitors are not changing.
Get the Right Tools
Make sure you are using the right tools for the job. Monitoring, testing, benchmarking, profiling, and other optimization techniques all have a variety of tools. Find the tool that best matches your situation best.
Rules of Thumb
While every app is unique, I do find some standard starting points:
- memory databases love memory
- disk io anything but raid 10 can kill database performance
- wrong optimizations - big values do not translate into big performance
- application - blaming the server for poor app design
Your Next Steps
If you don't find your bottleneck, adding a server may not help much. To solve disk IO you may need another server or SAN. If you have a ram bottleneck another server will solve the issue only in that it adds more RAM. Pretty costly move compared to just adding more RAM to your existing server.
Quick Fix
Over deploy. I've had to do this when it appears the application stack is the problem. Basically load up on CPU, RAM and disk IO (RAID 10, 15K SCSI or SSD). Go big on the hardware and then start tuning. This keeps you afloat until you solve the problems.