My server setup for a heavily used API

I will soon be buying a bunch of servers for an application that I am about to launch but I have concerns about my setup. I appreciate any feedback I get.

I have an application which will make use of an API that I wrote. Other users/developers will also make use of this API. The API server will receive requests and relay them on to worker servers. The API will only hold a mysql db of requests for logging purposes, authentication and for rate limiting.

Each worker server does a different job and in the future to scale, I will add more worker servers to be available to take on jobs. The API config file will be edited to take note of the new worker servers. The worker servers will do some processing and some will save a path to an image to local database to be later retrieved by the API to be viewed on my application, some will return strings of the outcome of a process and save that to a local database.

Does this setup look efficient to you? Is there a better way to restructure this? What issues should I consider? Please see image below, I hope it aids understanding.enter image description here


Solution 1:

Higher Availability

As Chris mentions, your API server is the single point of failure in your layout. What you're setting up is a message queuing infrastructure, something many people have implemented before.

Continue down the same path

You mention receiving requests on the API server and insert the job into a MySQL DB running on each server. If you want to continue on this path, I suggest removing the API server layer, and design the Workers to each accept commands directly from your API Users. You could use something as simple as round-robin DNS to distribute each API User connection directly to one of the available worker nodes (and retry if a connection isn't successful).

Use a Message Queue Server

More robust message queuing infrastructures use software designed for this purpose like ActiveMQ. You could use ActiveMQ's RESTful API to accept POST requests from API Users, and idle workers can GET the next message on the queue. However, this is probably overkill for your needs - it is designed for latency, speed, and millions of messages a second.

Use Zookeeper

As a middle ground, you may want to look at Zookeeper, even though it isn't specifically a message queue server. We use at $work for this exact purpose. We have a set of three servers (analogous to your API server) that run the Zookeeper server software, and have a web frontend for handling requests from users and applications. The web frontend, as well as the Zookeeper backend connection to the workers, have a load balancer to make sure we continue processing the queue, even if a server is down for maintenance. When the work is done, the worker tells the Zookeeper cluster that the job is complete. If a worker dies, that job will be sent to another work to complete.

Other concerns

  • Make sure jobs complete in the event that a worker isn't responding
  • How will the API know that a job is complete, and to retrieve it from the worker's database?
  • Try to reduce the complexity. Do you need an independent MySQL server on each worker node, or could they talk to the MySQL server (or replicated MySQL Cluster) on the API server(s)?
  • Security. Can anyone submit a job? Is there authentication?
  • Which worker should get the next job? You don't mention whether the tasks are expected to take 10ms or 1 hour. If they are fast, you should remove layers to keep latency down. If they are slow, you should be very careful to make sure shorter requests don't get stuck behind a few long running ones.

Solution 2:

The biggest issue I see is the lack of failover planning.

Your API server is a large single point of failure. If it goes down, then nothing works even if your worker servers are still functional. Additionally, if a worker server goes down, then the service that server provides is no longer available.

I suggest you look at the Linux Virtual Server project (http://www.linuxvirtualserver.org/) to get an idea of how load balancing and failover works, and to get an idea of how these can benefit your design.

There are many ways to structure your system. Which way is better is a subjective call that is best answered by you. I suggest you do some research; weigh the tradeoffs of the different methods. If you need for information about an implantation method, then submit a new question.