Kafka Msg VS REST Calls
Nowadays in microservice world, i’m seeing alot of design in my workplace that uses kafka messaging when you can achieve similar results using rest api calls between microservices. Technically you can stop using rest api calls altogether and instead use kafka messaging. I really want to know the best practice, pros and cons, when to use api calls between microsevices, when to use kafka messaging.
Lets put a real life example:
I have an inventory service and a vendor service. Everyday vendor service calls the vendor API to get new items and these need to be moved into inventory service. The number of items can be up to 10,000 objects.
For this use case, is it better to :
After getting new data from vendor API, call REST API of inventory service to store the new items.
After getting new data from vendor API, send them as message to a kafka topic, to be consumed by inventory service
Which way would you choose and what is the consideration
Solution 1:
Gist (for those who want just the gist)
-
-
Kafka - Publish & Subscribe (just process the pipeline, will notify once the job is done)
-
REST - Request & Await response (on-demand)
-
-
-
Kafka - Publish once - Subscribe n times (by n components).
-
REST - Request once, get the response once. Deal over.
-
-
-
Kafka - Data is stored in topic. Seek back & forth (offsets) whenever you want till the topic is retained.
-
REST - Once the response is over, it is over. Manually employ a database to store the processed data.
-
-
-
Kafka - Split the processing, have intermediate data stored in intermediate topics (for speed and fault-tolerance)
-
REST - Take the data, process it all at once OR if you wish to break it down, don't forget to take care of your OWN intermediate data stores.
-
-
-
Kafka - The one who makes the request typically is not interested in a response (except the response that if the message is sent)
-
REST - I am making the request means I typically expect a response (not just a response that you have received the request, but something that is meaningful to me, some computed result for example!)
-
Q&A style
Is your data streaming?
If the data keeps on coming and you have a pipeline to execute, Kafka is best.
Do you need a request-response model?
If the user requests for something and they wait for a response, then REST is best.
Kafka (or any other streaming platform) is typically used for pipelines i.e where we have forward flow of data.
Data comes to Kafka and from there it goes through component1, component2 and so on and finally (typically) lands in a database.
To get the information on-demand we need a data store (a database) where we can query and get it. In such a case we provide a REST interface which the user can invoke and get the data they want.
Regarding your example,
Everyday vendor service calls the vendor API to get new items and these need to be moved into inventory service
Questions & Answers
Is your vendor API using REST?
Then you need to pull the data and push to Kafka. From there your inventory service (or any other service thereafter) will subscribe to that topic and execute their processing logic.
The advantage here is that you can add any other service which requires vendor data as a consumer to the vendor topic.
Moreover, the vendor data is always there for you even after your inventory service processed it.
If you use REST for this, you need to call the Vendor API for every component that requires vendor data which becomes trivial when used with Kafka
Do you want the inventory to be queried?
Store it in a database after processing through Kafka and provide a REST on top of this. This is needed because Kafka is typically a log, to make the data query-able you would need some database.
Solution 2:
Microservices architecture advocates indepdent and autonomous services that can operate on their own. Lets understand why we need message queues?
HTTP protocol is sync
There is very wide misconception that HTTP is async. Http is synchronous protocol but your client could deal it async. E.g. when you call any service using http your http client would schedule is on the backend thread (async). However The http call will be waiting until either it's timeout or response is back , during all this time the http call chain is awaiting synchronously. Now if you have hundreds of requests at a time you can image how many http calls are scheduled synchronously and you may run of sockets.
AMQP
In Microservices architecture we prefer AMQP (Advance message queue protocol) . Which means the service drops the message in queue and forgets about it. This is true async transport protocol since your service is done once it drops the message in the queue and interested services will pick those.
This type of protocol is preferred since you can scale without worry even when other services are down as they will eventually get message/event/data.
So it really depends on your particular case. HTTP are easy to implement but you can't scale them well. Message services come with own challenges like order of messages and workers but that make the architecture scaleable and is preferred way. For write operation always prefer queue, for read operation you can use HTTP but make sure you are not doing a long chain where one service is calling another and that calls another.
Hope that helps !
Solution 3:
Main benefit with kafka:
With direct REST calls to each service - if you have N services that all need to talk to each other, that's around N^2/2 connections. You might also need to build some load balancer in front of some services that get lots of requests and maybe a queuing/buffering system within the service to queue up its requests (lol)
With kafka, you just need N topics. It already provides its queuing system, by definition.
Main drawback with kafka:
The services aren't waiting for request response. It's harder to associate a response with the request once a response does show up in a topic.