Is this a reasonable architecture for a newspaper site?
We run a newspaper-style site and are consolidating our architecture from one that grew amorphously into a more scalable, resiliant solution.
I was thinking of the following:
internet
|
h/w firewall
|
h/w load balancer
| |
| control server (nagios, mail server & misc)
|
pair of nginx load-balancing reverse caching proxies
| |
pair of apache app servers pair of mogilefs storage nodes
and mogilefs trackers
|
pair of mysql dbs (master/slave)
and mogilefs db
all machines will run 64-bit centos.
We need to be able to service 7 simultaneous users on the app servers, and serve 840 static files per second. So I was thinking of speccing things out like below:
- mogilefs storage nodes - 2GB RAM, Intel Atom (1.6GHz)
- app servers - 8GB RAM, AMD Athlon II X2 (2.8GHz)
- reverse proxies & control server - 4GB RAM, AMD Athlon II X2 (2.8GHz)
- dbs - 8GB RAM, AMD Phenom II X6 (2.8GHz)
All would have 7.2krpm disks. There's not a huge amount of data in the database, so it can basically all be cached in buffers. Plus we only have around a 15% memcached miss rate, so there's not a huge load on the db.
A future stage would be round-robin DNS with everything mirrored to a different data centre.
Is there anything missing from this topology? Has anyone done anything similar with any of the components? Do the machines seem like they're under-/over-specced?
Thanks
EDIT
A bit more info:
7 simultaneous page views per second to be served by apache - a lot of the cms content is cached anyway, on disk & using memcached where possible. 840 static files need serving per second - but this may be a little too high since with far-future expiry dates only a fraction of page views will be with cold caches on the client.
The only admins will upload static content to the mogilefs storage nodes. They might upload ~100 files per day. I'm new to mogilefs - they'll just use commodity disks (7.2krpm)
This content will then be accessed via http://static*.ourdomain... Nginx will proxy the request to this content and cache it locally so while the first retrieval may be a little slow, subsequent retrievals will come from the nginx cache.
Solution 1:
This is a bit too general to ask in a simple question. You will need to provide a lot more input on your proposed solution and on the load:
- What load (pages/cached pages/assets) will be served by what software in this stack (nginx, mogilefs, localfs, apache)? What will the loadbalancer do, what type is it?
- What CMS will you be using? How does it interact with mogile? What kind of storage will your mogilefs run on?
- While you can run mogile happy on 2Gb nodes and apache on 4Gb, I would not skimp on RAM. More memory will make a lot of things smoother.
- You don't mention CPU's, this is even more important in the CMS-picture
Also, I don't see any memcached in there; depending on the setup that could be useful.
7 simultaneous users does not sound a lot, how many pageviews per second is that in your view?
Edit to reflect the new info:
There are a lot of details to flesh out, but this appears to be reasonable. A lot will depend on how you configure the nginx caching and the CMS. Keep the network in mind as well, I'd suggest at least gigabit.
I'm a bit concerned with the mogilefs performance. If you are still in the design phase, I would suggest looking at alternatives (maybe direct filesystem replication) or future migration scenarios, depending on your requirements.
Also, your loadbalancer is presently a very highlevel element in the design. Untill you are very sure of the requirements in terms of performance and features, I'd leave all the options on the table there.
Solution 2:
You're doing ~7 page req/s from the (dynamic) webservers, and ~850 req/s for (smallfile) static content, and for this you need a multi-layered architecture with ~10 servers?
Just off the top of my head, that sounds way too slow. Either you're overbuilding, or your site has some slow slow code, or something else?
I would propose to benchmark your application thoroughly, and from that build an estimate on what hardware you need for your load.
A few thoughts:
Having 2 load balacing layers is additional complexity, is that needed? How about just one HW load balancer, and a single cache server (Squid or Varnish).
Never use Atom CPUs for real servers, they are way underpowered.
I don't see why you want to use old desktop class CPUs like dual-core Athlons. Modern quad-core server CPUs are at least 2x faster in real use. Using modern more powerful hardware would allow you to consolidate layers and simplify your architecture.
MogileFS is probably great; I don't know much about it except its origin and that it has been in heavy use for years with great success. But why set up a technology you're not familiar with just to scale to 2 servers? If you just need the performance level of 2 servers with Intel Atom CPUs, then ditch that config, and get a single modern quadcore server with a fast disk subsystem (4 or 8 disk RAID 10, or SSDs) instead.
Recommendations:
- Benchmark your own application and get the best metrics you can for your real-life cache hit ratio.
- Maybe find a consultant who has set up something like this several times before, and work together with him about final design?
Your architecture above is sound and well considered. But get some numbers for the real-life performance of the individual parts. :-)