What is a consumer group in Kafka?
A Kafka consumer group is a collection of consumers that work together to consume messages from one or more topics. Its primary purpose is to allow multiple consumers to share the workload of reading messages from a topic in a scalable and fault-tolerant manner.
Core Concept
When multiple consumers belong to the same consumer group, Kafka ensures that each partition of a topic is consumed by only one consumer within that group. This guarantees that messages within a partition are processed in order and that no message is processed more than once by a consumer within the same group.
Key Characteristics
- Each consumer group is identified by a unique
group.id. - A topic's partitions are divided among the consumers in the group. If there are more consumers than partitions, some consumers will be idle. If there are fewer consumers than partitions, some consumers will read from multiple partitions.
- Offsets (the position of the last consumed message) are tracked per consumer group per partition. This allows the group to resume consumption from where it left off, even if a consumer fails or leaves the group.
- Kafka automatically handles rebalancing of partition assignments among consumers whenever a consumer joins or leaves the group, or when topic partitions are added or removed.
Benefits
- Scalability: By adding more consumers to a group, you can increase the parallel processing capacity for a topic.
- Fault Tolerance: If a consumer fails, Kafka automatically reassigns its partitions to other active consumers in the same group, ensuring continuous message processing.
- Load Balancing: Messages are distributed across consumers within the group, optimizing resource utilization.
Example Scenario
Imagine a web-logs topic with 3 partitions. You could have two applications processing these logs:
- An 'analytics' application (Consumer Group A) that counts unique visitors. It might have 3 consumers, each processing one partition.
- A 'security-alert' application (Consumer Group B) that scans for suspicious activity. It might have 1 consumer that processes all 3 partitions.
Both applications (consumer groups) receive all messages from the web-logs topic independently, but within each group, messages are processed in parallel and without duplication by the respective consumers.