📨 Kafka Q11 / 20

What is ZooKeeper and how does Kafka use it?

AI-Powered Answer ✓ Answered

Apache ZooKeeper is a crucial component in many distributed systems, acting as a centralized service for maintaining configuration information, naming, providing distributed synchronization, and group services. Historically, Kafka heavily relied on ZooKeeper for various essential functions, although newer Kafka versions are moving towards an independent metadata management system.

What is Apache ZooKeeper?

Apache ZooKeeper is an open-source server for highly reliable distributed coordination of distributed applications. It provides a simple set of primitives that distributed applications can build upon, such as atomic broadcasts, group services, distributed locks, and leader election. ZooKeeper aims to simplify the development of distributed applications by providing a consistent and highly available store for configuration data and group metadata, allowing applications to focus on their core logic rather than distributed coordination complexities.

How Kafka Historically Used ZooKeeper

In earlier versions, Kafka was deeply integrated with ZooKeeper, leveraging it for critical metadata management and coordination tasks. ZooKeeper served as the primary source of truth for the Kafka cluster's state. Key functions it managed included:

  • Broker Registration: Each Kafka broker registered itself with ZooKeeper upon startup, allowing the cluster to maintain an up-to-date list of available brokers.
  • Controller Election: ZooKeeper was used to elect a single broker as the 'controller' for the entire Kafka cluster. The controller is responsible for managing partition leaders, handling broker failures, and initiating topic reassignments.
  • Topic Configuration: Metadata about topics, such as the number of partitions, replica factors, and their assignments to brokers, was stored in ZooKeeper.
  • Consumer Group Management: For older Kafka consumers (prior to Kafka 0.9), ZooKeeper tracked consumer offsets and managed consumer group membership, facilitating partition assignment among consumers in a group.
  • Access Control Lists (ACLs): Security-related configurations like ACLs, defining permissions for users and applications to produce or consume from specific topics, were also stored in ZooKeeper.

Kafka's Evolution: The Move Away from ZooKeeper (KRaft)

While ZooKeeper has been fundamental to Kafka's stability, its separate operational overhead and scaling limitations led to the development of the KRaft (Kafka Raft metadata mode) project. KRaft aims to remove the ZooKeeper dependency entirely by integrating metadata management directly into Kafka brokers, using the Raft consensus algorithm.

With KRaft, a subset of Kafka brokers (called 'controller' or 'metadata' quorum brokers) form a Raft quorum to manage and store all cluster metadata, similar to how ZooKeeper operated. This transition simplifies Kafka's architecture, reduces operational complexity, improves scalability, and allows for faster controller failover times. Newer Kafka versions offer the option to run in KRaft mode, making the cluster self-contained without an external ZooKeeper ensemble.