RabbitMq and KStreams for Data Aggregation

I'm trying to solve the problem of data denormalization before indexing to the Elasticsearch. Right now, my Postgres 11 database is configured with pgoutput plugin and Debezium with Postgresql Connector is streaming the log changes to RabbitMq which are then aggregated by doing a reverse lookup on the db and feeding to the Elasticsearch.

Although, this works okay, the lookup at the App layer to aggregate the data is expensive and taking a lot of execution time (the query is already refined but it has about 10 joins making it sloppy).

The other alternative I explored was to use KStreams for data aggregation. My knowledge on Apache Kafka is minimal and thus I'm here. My question here is it a requirement to have Apache Kafka as the broker to be able to utilize the Java KStreams API or can it be leveraged with any broker such as RabbitMq? I'm unsure about this because all the articles talk about Kafka Topics and Key Value pairs which are specific to Apache Kafka.

If there is a better way to solve the data denormalization problem, I'm open to it too.

Thanks

Solution 1:

Kafka Steams is only for Kafka. You're more than welcome to use Kafka Streams between Debezium and the process that consumes any topic (the Postgres connector that writes to RabbitMQ?)

You can use Spark, Flink, or Beam for stream processing on other messaging queues, but Debezium requires Kafka so start with tools around that.

Spark, for example, has an Elasticsearch writer library; not sure about the others.

RabbitMq and KStreams for Data Aggregation

Solution 1:

Related

Recent Posts