I did a poc in which I read data from Kafka using spark streaming. But our organization is either using Apache Flink or Kafka consumer to read data from Apache kafka as a standard process. So I need to replace Kafka streaming with Kafka consumer or Apache Flink. In my application use case, I need to read data from kafka, filter json data and put fields in cassandra, so the recommendation is to use Kafka consumer rather than flink/other streamings as I don't really need to do any processing with Kafka json data. So I need your help to understand below questions:

  1. Using Kafka consumer, can I achieve same continuous data read as we do in case of spark streaming or flink?

  2. Is kafka consumer sufficient for me considering I need to read data from kafka, deserialize using avro scehma, filter fields and put in cassandra?

  3. Kafka consumer application can be created using kafka consumer API, right?

  4. Is there any down sides in my case if I just use Kafka consumer instead of Apache flink?


Solution 1:

Firstly, let's take a look at Flinks Kafka Connector, And Spark Streaming with Kafka, both of them use Kafka Consumer API(either simple API or high level API) inside to consume messages from Apache Kafka for their jobs.

So, regarding to your questions:

  1. Yes

  2. Yes. However, if you use Spark, you can consider to use Spark Cassandra connector, which helps to save data into Cassandara efficiently

  3. Right

  4. As mentioned above, Flink also uses Kafka consumer for its job. Moreover, it is a distributed stream and batch data processing, it helps to process data efficiently after consuming from Kafka. In your cases, to save data into Cassandra, you can consider to use Flink Cassandra Connector rather than coding by yourselve.