What is a Kafka topic?
In Apache Kafka, a topic is a fundamental concept that serves as a category name or feed name to which records are published by producers. It's the primary way Kafka organizes data, allowing for highly scalable and fault-tolerant message streaming.
What is a Kafka Topic?
A Kafka topic can be thought of as a log or a stream of data. Producers write data (records) to topics, and consumers read data from topics. Each record in a topic consists of a key, a value, and a timestamp. Topics are identified by unique names within a Kafka cluster.
They act as durable, append-only logs of messages. Once a message is written to a topic, it persists for a configurable amount of time (retention period) or until it reaches a certain size, even if consumers have already read it.
Partitions: The Backbone of Topics
To achieve scalability and parallelism, a Kafka topic is divided into one or more partitions. Each partition is an ordered, immutable sequence of records, and each record within a partition is assigned an incremental ID called an offset.
- Ordering: Messages within a single partition are always ordered by their offset. Kafka only guarantees ordering within a partition, not across the entire topic.
- Scalability: Partitions are distributed across different brokers (servers) in a Kafka cluster, allowing for concurrent reads and writes.
- Replication: Each partition can be replicated across multiple brokers to provide fault tolerance. If one broker fails, replicas on other brokers can take over.
- Consumer Groups: A consumer group can have multiple consumer instances, and each instance typically reads from one or more distinct partitions of a topic, enabling parallel consumption.
Producers and Consumers Interaction
Producers publish records to specific topics. They can choose to send a record to a particular partition using a key (which hashes to a partition), or let Kafka distribute it round-robin among partitions if no key is provided.
Consumers subscribe to one or more topics and read records from them. When multiple consumers belong to the same 'consumer group', Kafka ensures that each partition is consumed by only one consumer instance within that group, balancing the load and allowing for parallel processing of messages.
Key Characteristics of Topics
- Durability: Records in topics are persisted to disk and replicated, ensuring data safety and availability.
- Decoupling: Producers and consumers operate independently, without needing to know each other's details. Topics act as an intermediary.
- Multi-subscriber: Multiple consumer groups can independently read all messages from a topic without affecting each other.
- Retention: Messages are retained in topics for a configurable period, allowing consumers to process them at their own pace or for replays if needed.