How can I send large messages with Kafka (over 15MB)?
Solution 1:
You need to adjust three (or four) properties:
- Consumer side:
fetch.message.max.bytes
- this will determine the largest size of a message that can be fetched by the consumer. - Broker side:
replica.fetch.max.bytes
- this will allow for the replicas in the brokers to send messages within the cluster and make sure the messages are replicated correctly. If this is too small, then the message will never be replicated, and therefore, the consumer will never see the message because the message will never be committed (fully replicated). - Broker side:
message.max.bytes
- this is the largest size of the message that can be received by the broker from a producer. - Broker side (per topic):
max.message.bytes
- this is the largest size of the message the broker will allow to be appended to the topic. This size is validated pre-compression. (Defaults to broker'smessage.max.bytes
.)
I found out the hard way about number 2 - you don't get ANY exceptions, messages, or warnings from Kafka, so be sure to consider this when you are sending large messages.
Solution 2:
Minor changes required for Kafka 0.10 and the new consumer compared to laughing_man's answer:
- Broker: No changes, you still need to increase properties
message.max.bytes
andreplica.fetch.max.bytes
.message.max.bytes
has to be equal or smaller(*) thanreplica.fetch.max.bytes
. - Producer: Increase
max.request.size
to send the larger message. - Consumer: Increase
max.partition.fetch.bytes
to receive larger messages.
(*) Read the comments to learn more about message.max.bytes
<=replica.fetch.max.bytes
Solution 3:
The answer from @laughing_man is quite accurate. But still, I wanted to give a recommendation which I learned from Kafka expert Stephane Maarek. We actively applied this solution in our live systems.
Kafka isn’t meant to handle large messages.
Your API should use cloud storage (for example, AWS S3) and simply push a reference to S3 to Kafka or any other message broker. You'll need to find a place to save your data, whether it can be a network drive or something else entirely, but it shouldn't be a message broker.
If you don't want to proceed with the recommended and reliable solution above,
The message max size is 1MB (the setting in your brokers is called message.max.bytes
) Apache Kafka. If you really needed it badly, you could increase that size and make sure to increase the network buffers for your producers and consumers.
And if you really care about splitting your message, make sure each message split has the exact same key so that it gets pushed to the same partition, and your message content should report a “part id” so that your consumer can fully reconstruct the message.
If the message is text-based try to compress the data, which may reduce the data size, but not magically.
Again, you have to use an external system to store that data and just push an external reference to Kafka. That is a very common architecture and one you should go with and widely accepted.
Keep that in mind Kafka works best only if the messages are huge in amount but not in size.
Source: https://www.quora.com/How-do-I-send-Large-messages-80-MB-in-Kafka