Suddenly milions of visitors to EC2 instance run wordpress multisite. Very slow, now what? [duplicate]

This is a canonical question about capacity planning

Related:

  • How do you do load testing and capacity planning for web sites?
  • How do you do load testing and capacity planning for databases?

I have a question regarding capacity planning. Can the Server Fault community please help with the following:


  • What kind of server do I need to handle some number of users?
  • How many users can a server with some specifications handle?
  • Will some server configuration be fast enough for my use case?
  • I'm building a social networking site: what kind of hardware do I need?
  • How much bandwidth do I need for some project?
  • How much bandwidth will some number of users use in some application?

The Server Fault community generally can't help you with capacity planning - the best answer we can offer is "Benchmark your code on hardware similar to what you'll be using in production, identify any bottlenecks, then determine how much of a workload your current hardware can handle, and/or how much hardware horsepower you need to handle your target workload".


There are a number of factors at play in capacity planning which we can't adequately assess on a Question and Answer site:

  • The requirements of your particular code/software
  • External resources (databases, other software/sites/servers)
  • Your workload (peak, average, queueing)
  • The business value of performance (cost/benefit analysis)
  • The performance expectations of your users
  • Any service level agreements/contractual obligations you may have

Doing a proper analysis on these factors, and others, is beyond the scope of a simple question-and-answer site: They require detailed knowledge about your environment and requirements which only your team (or an adequately-compensated consultant) can gather efficiently.


Some Capacity Planning Axioms

  1. RAM is cheap
    If you expect your application to use a lot of RAM you should put in as much RAM as you can afford / fit.
  2. Disk is cheap
    If you expect to use a lot of disk you should buy big drives - lots of them.
    SAN/NAS storage is less cheap, and should also usually be spec'd large rather than small to avoid costly upgrades later.
  3. Workloads grow over time
    Assume your resource needs will increase.
    Bear in mind that the increase may not be symmetrical (CPU and RAM may rise faster than disk), and it may not be linear.
  4. Electricity is expensive
    Even though RAM and disks have decreased in price considerably, the cost of electricity has gone up steadily. All those extra disks and RAM, not to mention CPU power, will increase your electricity bill (or the bill you pay to your provider). Plan accordingly.

Virtual Machine Count planning

When it comes to figuring out how many VMs you should plan for on a single host, there are actually no really good rules of thumb. In fact, there is only one, and it is only kind of good:

Virtual-Machine counts are usually bounded by RAM, except for when they're not.

Which isn't terribly helpful. If those VMs are going to be running low-CPU applications, then your limiter is going to be based on RAM. Each VM platform has its own abilities to oversubscribe RAM, so it isn't as easy as TOTAL_RAM / Per-VM-RAM = MachineCount, but that number is a good planning item.

But what if your VMs are doing things besides low-CPU packet-slinging?


Virtual-machine counts are bounded by seven discrete resources available to the host machine:

  • Hypervisor VMware, Xen, HyperV, KVM, whatever. Each has their own count-impacting features. Some are very good at memory-page deduplication, others not so much. Some don't permit oversubscription of CPU capacity, some do.
  • CPU Core Speed This limits the maximum single-threaded performance a VM will be able to run. 36 cores of a 1.8 GHz CPU may be 64.8 GHz of CPU on a host, but no single thread will run faster than 1.8 GHz.
  • CPU Core Count This, with core-speed, describes the ceiling of maximal CPU performance you can experience.
  • System RAM As described above, this limits the number of VMs you can run. Certain hypervisors are better than others at things like memory-page deduplication, so if you're running 100 identical VMs you can pack a lot more of these on such deduplicating systems than if you were running 100 completely different VMs.
  • Disk Size Each OS image takes a certain amount of space. You need enough space to store it all. Therefore, disk-size puts an upper limit on how many VMs you can host.
  • I/O Bandwidth The disk underlying the VMs has a maximum on how many I/Os per second it can handle. If you throw too much at it, systems will bog down waiting for the I/O to complete. This puts an upper limit on how many I/O consuming VMs you can run.
  • Network Bandwidth For network-using VMs, the available network bandwidth will put a ceiling on how many such VMs you can run on a given host.

All of these can be the thing you trip over, it all depends on what you're doing with your VMs. Some things to remember:

  • There is no such thing as a generic system.
  • There is no such thing as a generic web-server, since application code can run from barely-moves-the-needle CDN-style serving, to big deep-crack stuff like video transcoding.
  • There is no such thing as a generic database server. These can run from tiny systems used just for session-state-tracking, to very big ones.

To figure out how many VMs you can pack into a host-system, you need to know how your systems run and what they require to run well. Once you know that, you can then do the count-planning. And better yet, figure out how beefy you need to make your host-systems!