What is the point of staging?

I thought I'd worked this out, but after reading Continuous Delivery (excellent book) I'm a little confused. They talk about having servers for:

  • development
  • various forms of automated tests
  • user acceptance testing (UAT) - ie sitting down with the client and demonstrating it to them, and letting them do exploratory testing. The in-house testers could also use this setup for exploratory testing.
  • staging
  • production.

I'd always thought of staging as providing the UAT function, but they seem to have staging as a separate level. So in that scheme, what function would the staging servers provide?


Solution 1:

Staging would be putting the full product systems in place, but not actually using them yet. When they go into use would be "production". You should put everything in place as it will be used, test, then flip the switch.

The UAT commonly uses "testing" environment that are significantly different from the hardware/software/configuration that will be used in production.

For example, where I work we have the customers test everything in a VM environment running on our servers. When their system goes live it will be running on their hardware, at their facility, probably integrating with their existing systems; it will have absolutely nothing to do with our servers or test environment (except that the code and some configuration has been copied from there...)

Solution 2:

I work on the release management team at a very large internet company. We use essentially the process you've outlined above, and we've chosen that process on purpose. In our methodology, staging serves as a branching mechanism for a final level of testing in production.

Obviously you want to do all testing before you go to production, but in a large, complex environment with lots of users, that's a very difficult goal to reach. In particular, it is virtually impossible to adequately load test software in QA. Functional testing is a lot easier to automate than load testing. When you have many thousands of users hitting your servers, things fail in weird and hard to predict ways.

So here's what we do:

  • Development
    • includes continuous integration and automated testing
  • release testing
    • my group analyzes the release itself
    • reviewing install logs
    • testing rollback
  • QA
    • user acceptance testing

That's the point at which we branch between staging and production. We use a train model for releases, with a new train starting every few weeks. Even numbered trains go to the staging servers (which are in production). Odd numbered trains do not.

In between the even trains, the developers have the ability to push individual changes to the staging servers (after those changes have been tested by QA of course). This allows them to validate that their software performs as expected in a real production environment. This is generally reserved for the components which are deemed higher risk, we don't push every little piece to staging.

Then, everyone understands that when the next even train starts, it will wipe out what's on the staging servers and set them back to the train baseline. Developers either ensure that their changes got on the train, or decide they aren't ready for general use yet, in which case those changes just get erased on the staging servers.

To sum up, the short answer (at least for us) is that it is impossible to completely test complex systems in QA. Staging provides a safe way to do limited production testing.

On a related note, here's my slides from a presentation I just gave on how our release process works.