How do you do load testing and capacity planning for web sites?

This is a canonical question about capacity planning for web sites.

Related:

Can you help me with my capacity planning?

How do you do load testing and capacity planning for databases?

What are some recommended tools and methods of capacity planning for web sites and web-applications?

Please feel free to describe different tools and techniques for different web-servers, frameworks, etc., as well as best-practices that apply to web servers in general.

Solution 1:

The short answer is: Nobody can answer this question except you.

The long answer is that benchmarking your specific workload is something that you need to undertake yourself, because it's a bit like asking "How long is a piece of string?".

A simple one-page static website could be hosted on a Pentium Pro 150 and still serve thousands of impressions every day.

The basic approach you need to take to answer this question is to try it and see what happens. There are plenty of tools that you can use to artificially put your system under pressure to see where it buckles.

A brief overview of this is:

Put your scenario in place
Add monitoring
Add traffic
Evaluate results
Remediate based on results
Rinse, repeat until reasonably happy

Put your scenario in place

Basically, in order to test some load, you need something to test against. Set up an environment to test against. This should be a fairly close guess to your production hardware if possible, otherwise you will be left extrapolating your data.

Set up your servers, accounts, websites, bandwidth, etc. Even if you do this on VMs that's OK just as long as you're prepared to scale your results.

So, I'm going to set up a mid-powered virtual machine (two cores, 512 MB RAM, 4 GB HDD) and install my favourite load balancer, haproxy inside Red Hat Linux on the VM.

I'm also going to have two web servers behind the load balancer that I'm going to use to stress test the load balancer. These two web servers are set up identically to my live systems.

Add Monitoring

You'll need some metrics to monitor, so I'm going to measure how many requests get through to my web servers, and how many requests I can squeeze through per second before users start getting a response time of over two seconds.

I'm also going to monitor RAM, CPU and disk usage on the haproxy instance to make sure that the load balancer can handle the connections.

How to do this depends a lot on your platforms and is outside of the scope of this answer. You might need to review web server log files, start performance counters, or rely on the reporting ability of your stress test tool.

A few things you always want to monitor:

CPU usage
RAM usage
Disk usage
Disk latency
Network utilisation

You might also choose to look at SQL deadlocks, seek times, etc depending on what you're specifically testing.

Add traffic

This is where things get fun. Now you need to simulate a test load. There are plenty of tools that can do this, with configurable options:

JMeter (Web, LDAP)
Apache Benchmark (Web)
Grinder (Web)
httperf (Web)
WCAT (Web)
Visual Studio Load Test (Web)
SQLIO (SQL Server)

Choose a number, any number. Let's say you're going to see how the system responds with 10,000 hits a minute. It doesn't matter what number you choose because you're going to repeat this step many times, adjusting that number up or down to see how the system responds.

Ideally, you should distribute these 10,000 requests over multiple load testing clients/nodes so that a single client does not become a bottleneck of requests. For example, JMeter's Remote Testing provides a central interface from which to launch several clients from a controlling Jmeter machine.

Press the magic Go button and watch your web servers melt down and crash.

Evaluate results

So, now you need to go back to your metrics you collected in step 2. You see that with 10,000 concurrent connections, your haproxy box is barely breaking a sweat, but the response time with two web servers is a touch over five seconds. That's not cool - remember, your response time is aiming for two seconds. So, we need to make some changes.

Remediate

Now, you need to speed up your website by more than twice. So you know that you need to either scale up, or scale out.

To scale up, get bigger web servers, more RAM, faster disks.

To scale out, get more servers.

Use your metrics from step 2, and testing, to make this decision. For example, if you saw that the disk latency was massive during the testing, you know you need to scale up and get faster hard drives.

If you saw that the processor was sitting at 100% during the test, perhaps you need to scale out to add additional web servers to reduce the pressure on the existing servers.

There's no generic right or wrong answer, there's only what's right for you. Try scaling up, and if that doesn't work, scale out instead. Or not, it's up to you and some thinking outside the box.

Let's say we're going to scale out. So I decide to clone my two web servers (they're VMs) and now I have four web servers.

Rinse, repeat

Start again from Step 3. If you find that things aren't going as you expected (for example, we doubled the web servers, but the reponse times are still more than two seconds), then look into other bottlenecks. For example, you doubled the web servers, but still have a crappy database server. Or, you cloned more VMs, but because they're on the same physical host, you only achieved higher contention for the servers resources.

You can then use this procedure to test other parts of the system. Instead of hitting the load balancer, try hitting the web server directly, or the SQL server using an SQL benchmarking tool.

Solution 2:

Capacity planning starts with measurement, in this case response time versus load. Once you know the degree to which the programs slows down with load, which is NOT a linear function, you can select a response time target, and then discover what resources it will take to meet that target for a given amount of load.

Performance measurement is always done with time units, as

they are what users care about
they can be scaled up and down

Things like %CPU and IOPS are system-specific, so you only use them when you have planned the system and measured it in pre-production, to act as a "surrogate" for the thing you care about, time.

Solution 3:

Capacity planning is a troublesome beast. It's as much science as art (if definitely a dark one).

Your best case is that you make well-informed decisions and fortune/luck favors you by having reality meet your assumptions. If your capacity need assumptions match reality, you look like a mystical yogi. Unfortunately, if your assumptions exceed reality, you will appear to have overshot and overspent. More unfortunately, if your assumptions are below the eventual reality (or are otherwise incorrect), you will lack the capacity that you need, and will have to scramble to mitigate the failures of your groaning infrastructure, which makes you look like you lack competency.

No pressure...

Unfortunately, the dark art of capacity planning is more than can be reasonably distilled into a single Server Fault answer; really, it's a topic worthy of books.

Fortunately, there is such a book: "The Art of Capacity Planning"

Solution 4:

To expand on Mark Henderson's post, I'm writing this specific to Apache. To reiterate what he said, "The short answer is: Nobody can answer this question except you." The text of this answer is borrowed heavily from my answer to a similar question about a Drupal website's performance.

Configuring Apache With Mod_Prefork

Apache is arguably one of the (if not the) most popular web server available. It is open source and is still actively maintained. You can run it on both Linux and Windows operating systems, but is more popular in the Linux / Unix world.

You should never use an out-of-the-box Apache config. You always need to tune Apache to your site. The main Apache configuration file on CentOS is located at /etc/httpd/conf/httpd.conf, and the main Apache config file on Ubuntu systems is typically located at /etc/apache2/apache2.conf. Additional config files are used for things like Virtual Hosts.

Like a lot of software, Apache is built to be flexible and customized according to a specific website's needs. There are different Multi-Processing Modules that Apache can be configured to use to bind to a network port and accept & process the requests.

Most of the time on default Apache installations that come with CentOS and Ubuntu servers, the MPM "mod_prefork" is used. Assuming you're using mod_prefork (if you're not sure, then that's the more likely, but only you can determine that) Here's the basics of how to configure it:

Figure out the maximum amount of memory you want Apache to be able to use.
Heavily test your website, and determine how much memory each Apache process uses (using top).
Take the Apache process in top that uses the most memory, add a little bit to it for good measure, and then divide your first number (maximum amount of memory you want Apache to use) by this new number.
The number you get should be your MaxClients & ServerLimit variables.

This is certainly not the end-all answer. Tuning your Apache server takes time and requires experience to get just right.

Can you help me with my capacity planning?

Can you help me with my software licensing issue?

New battery: "The battery's capacity is significantly reduced."

Can you really upgrade the RAM in the M1 2020 Mac Mini to 64 Gigs?

Max OS X: White saturation is terrible [duplicate]

IAPs on App Store Connect: For in-app-purchase review, what does it mean to upload binary?

Why don't I see a path to etags in PATH when running Emacs from http://emacsformacosx.com/ from the OS X dock?

Why is exFAT so slow on Mac? (reloaded)

Is there a way to prevent a bad actor from using your AirTag to track you?

Convert PS1 prompt from bash to zsh for colour customisation

Green Flickering on Black Pixels - Macbook Pro 2011 (Auto Graphic Switching)

How to solve the "Connection Unsuccessful" when trying to connect iPhone to Mac Book Pro via bluetooth?