Once you move beyond 100k hits/month, what would people in this community say is the biggest hurdle?

My situ: Tons of static media (audio/video/images) being served off of S3/CDN, but being stored locally as backup (not served though). Everything that can be cached, is cached, with about 8 gigs of memory scalable to 32.

We're currently handling about 100k hits without a problem, and would love to know what problems others have run into: load balancing? memory issues? disk i/o?

Thanks for any tips. I've looked through the related questions, and they answered them well, but just wanted to get some more feedback.


Solution 1:

Hardware is so cheap these days that increasingly few people even get to the point where they need more than one machine. $10,000 will buy you:

  • 16-32GB of RAM;
  • 2 quad core Xeons;
  • RAID5 disk array.

That kind of machine can serve over 10,000 concurrent users on an even mildly optimized site for all but the most resource intensive applications.

Basically there are two approaches to scalability:

  • Vertical: basically buying the biggest machine you can so you don't need more than one;
  • Horizontal: doing things in a way that lends itself to parallelism. Only needed on the most intensive of applications.

Look at StackOverflow: its basically run on a Web server plus a database server and it does in excess of 6 million hits a month.

That being said, scalability is about finding and addressing your bottleneck.

  • If your database is slowing things down either give it more resources or use some form of in-memory caching to take the load off;
  • If disk I/O is your problem then the same applies;
  • If you're running out of memory to the point where it's causing too many page faults and thus causing a disk I/O problem, add more memory;
  • Does your application and it's data lend itself to partitiioning across servers? If so, that's one way of scaling horizontally;
  • If bandwidth is an issue and you're delivering large files then perhaps a CDN is the answer;
  • And so on.

Ultimately though, 100k hits/month is not that large. I suspect that these days on a typical Web site you'd need to get beyond 10 million/month before you had real problems, assuming you don't do things badly (eg if you don't index your database searches then of course you'll be having problems with that but they're nothing to do with hits/month).

I would say redundancy is a far bigger headache than scalability. The issues involved in having redundant links, monitoring processes for system failure, having and maintaining a DR (disaster recovery) site, dealing with the issues that entails (like split-brain clustering), etc are far more difficult and tedious.

Solution 2:

Use Varnish or Squid. Both are web accelerators and both are excellent for static media files.

If you scale out, you can even have 1 machine for web cache and 1 machine for the dynamic content

Alternatively, you could try tweaking apache by using the mods: mod_expires, mod_headers, mod_cache, mod_file_cache, mod_mem_cache.

Solution 3:

Depends a little bit on what sort of database handling your doing - if you are serving static content or near static content; you will scale well beyond 100K/mo with very modest hardware.

Complex database driven sites such as forums can be a larger problem, you will need to look into database replication, reverse proxies on the main website to act as additional caches, and additional load-balanced webservers. Most 'heavy' DB-using software also supports things such as memcached, which can be used to reduce the load on the database on common requests.

Frankly though, I think you are probably overestimating demand right now - 100K is within the realm of handling on a single machine just fine. Once you break 10M, then you need to look at more complex solutions such as the ones I have outlined above.

My personal advice is to always favour parallel machines over single complex ones - not only does it end up being cheaper, but it gives you a lot more room to grow when the time comes. Expanding already expensive hardware gets you into the realm of $25,000 servers - where 'bang for buck' disintegrates.

Solution 4:

Once you move beyond 100k hits/month

Is NOTHING!!! If you have troubles serving 100k hits per month you are in trouble. Generally you should not have problems with most dumbest systems serving 100K per day.

This is about LAMP... and CMS systems

My situ: Tons of static media (audio/video/images) being served off of S3/CDN

There are very good web servers for these purposes: lighttpd -- it serves YouTuble media streaming, nginx is good as well.

These are lightweight web servers that scale extremely well with high loads.

Even switching Apache module from mod-prefork to mod-worker would help (even lighttpd are still much better).

Bottom line: these loads are far from being high.