Kafka connect - how many tasks per connector
As I see from the documentation and other references, it seems the connector will be instantiated with a single task no matter the value defined through the property (tasks.num)
- Distributed Official Mongodb Kafka Source Connector with Multiple tasks Not working
- What is the relationship between connectors and tasks in Kafka Connect?
- Whether this property
tasks.num
will have any impact like in the case of fail over etc ..? Say , iftasks.num
is configured with 2 and a jdbc connector is used with a single task and if that task fails and other will take over ? - What is the significance of distributed mode in this case, effectively, the connector is created with a single task ?
For the source connector, as linked, this is because it uses a single Change Stream cursor. How exactly do you expect more than one task to not get conflicting information such as read the same data and duplicate it into the topic?
Connect runs sources and sinks. Many sources only support single tasks, but it depends on their internal threading model; for example, you could have one task per collection/table, but if there's only one unified item, such as a change-stream or binlog, then there can only be one task. You've mentioned JDBC, however Debezium would be preferred for CDC, if it supports your database.
Distribution is also for fault tolerance, not just scalability. Only some exceptions are recoverable and can be restarted on other nodes