Kafka is a messaging framework for building real time streaming applications. It allows to build distributed publish-subscribe systems. In this article we will present the core concepts of the framework.
One way to start with Kafka is to understand its APIs. There are four of them :
- The Producer API allows to publish records to a particular category of message called topic.
- The Streams API allows to transform messages from a topic into messages of a output topic
- The Consumer API allows to subscribe to messages published.
- The Connect API allows to send messages to other applications such as a Database
Clients that expose those APIs are available in several languages such as Java, C#, C++ and Python.
Kafka is based on the concept of topic. A topic is a category of messages. It is divided into several parts called partitions. The whole topic represents a log. A partition is an append only, ordered sequence of records.
Each record in a partition is uniquely identified by an id called the offset. The record consists in a key, a time stamp, and a value. Records are saved on disk and have an automatic retention policy: they are erased after the configured period. Partitions can also be replicated.
The producer chooses to which partition he will write into. For example it could be in a round robin fashion or based on a hash function.
A consumer group represents a set of consumer instances. The consumption of the partitions is fairly divided among the consumer instances. Each consumer instance is the exclusive consumer of “his” partitions.
Key elements to understand
Kafka can deal with large amount of data with little overhead thanks to key features:
- Consumption managed by consumers means less overhead on the server side
- Partitions having different consumers provide parallelism and a way to scale
- Automatic deletion gives the fastest way to delete old record
The framework also give some guarantees:
- Partitions replication provides fault tolerance
- Persistent messages allows to replay computation for a consumer
- A consumer has an exclusivity (within its group) over the partitions he consumes
- Writes in a partition are append only and ordered
Kafka is made with simple choices yet rather innovative. Those choices makes it robust, fast and scalable.