How to integrate Oracle and Kafka
I've been trying to find the most efficient/effective way capture change notifications in a single Oracle 11g R2 instance and deliver those events to an Apache Kafka queue, but I haven't been able to find any simple examples or tutorials along these lines.
I've seen some possibilities on the Oracle side (Streams, Change Data Capture,triggers (yuck), etc..), but I'm still not sure which would be best to pursue.
Here is a project utilizing MySQL and Kafka on GitHub called mypipe, I just haven't seen anything similar for Oracle. I'm not sure if it would be best to focus writing an Oracle package for this, or a layer similar to the mypipe project, etc. etc..
Any recommendations, suggestions or examples would be greatly appreciated. Thank you.
Solution 1:
There is currently just one tool which is open source and has minimal impact on the database. This is OpenLogReplicator.
license is GPL - it is fully open source
it has very low impact on the source database - it requires no licensing options and just turning on supplemental logging on the source (like all other replication tools)
it is written completely in C++ - so it has very low latency and high throughput
it works completely in memory
it supports all Oracle database versions since 11.2.0.1 (11.2, 12.1, 12.2, 18, 19)
It reads binary format of Oracle Redo logs and sends them to Kafka. It can work on the database host, but you can also configure it to read the redo logs using sshfs from another host - with minimal load of the database.
disclaimer #1: I am the author of this solution
disclaimer #2: to other StackOverflow users: please do not delete this answer. This question has a lot of duplicates. But this is the first question and other duplicates should be redirected here and marked as duplicates. Not the other way. I have deleted all other answers from other questions and just leaving this answer as the primary answer.
Solution 2:
I think one approach might be to utilize Oracle GoldenGate for Big Data (researching this myself), obviosuly its most likely a costly solution ($)?
https://blogs.oracle.com/dataintegration/entry/introducing_oracle_goldengate_for_big
Let me know if you got anywhere with this, good luck ...