📨 Kafka Q10 / 20

What is Kafka offset?

AI-Powered Answer ✓ Answered

In Apache Kafka, an offset is a unique identifier assigned to each message within a specific partition. It serves as a sequential, zero-based integer that indicates the position of a message in a partition's log.

What is a Kafka Offset?

Every message written to a Kafka partition is appended to the partition's log and is assigned a monotonically increasing offset. This offset acts like an index, uniquely identifying a message within that particular partition. For example, the first message in a partition will have an offset of 0, the second message an offset of 1, and so on.

It's crucial to understand that offsets are unique *per partition*, not globally across a topic or cluster. The same offset number might refer to different messages in different partitions.

Why are Offsets Important?

  • Consumer Progress Tracking: Consumers use offsets to keep track of which messages they have already processed from a partition. When a consumer reads messages, it tracks the offset of the last message successfully processed.
  • Fault Tolerance and Recovery: If a consumer crashes or is restarted, it can resume processing from its last committed offset, ensuring that it doesn't re-process messages already handled (at least once semantics) or miss any messages.
  • Re-processing Data: Consumers can explicitly reset their offsets to an earlier point in time, allowing them to re-read and re-process historical data if needed for analysis, debugging, or error recovery.
  • Consumer Group Coordination: In a consumer group, offsets are used by Kafka to coordinate which messages each consumer instance in the group should read from specific partitions, ensuring that each message is processed by only one consumer within the group.

How Offsets are Managed

Consumers are responsible for 'committing' their offsets to Kafka. This means they periodically inform Kafka (specifically, the broker designated as the group coordinator) about the offset of the last message they have successfully processed. Kafka stores these committed offsets in a special internal topic called __consumer_offsets.

When a consumer starts, it queries the group coordinator for its last committed offset for each assigned partition and begins reading messages from the next available offset after that. This mechanism allows for robust and scalable consumer applications.

Key Characteristics

  • Partition-Specific: Offsets are unique within a partition, not across a topic.
  • Sequential and Monotonic: Messages are assigned ever-increasing, contiguous offsets within a partition.
  • Consumer-Managed (Commit): Consumers decide when to commit their offsets, influencing their recovery point.
  • Immutable for Messages: Once a message is written to a partition with a specific offset, that message's offset never changes.

Example Scenario

Imagine a consumer assigned to 'topic-A', partition 0. It reads messages up to offset 100, processes them, and then commits offset 101. If the consumer then crashes before committing offset 105, upon restart, it will fetch its last committed offset (101) and resume reading messages starting from offset 101. This ensures that messages from offset 101 to 104 (which were read but not committed) might be re-processed, adhering to Kafka's 'at least once' delivery guarantee.