LinkedIn’s Kafka paper: Real-time log processing
The messaging queue and log aggregator hybrid
Modern applications generate vast amounts of log data. From user actions like likes, comments, and clicks to backend events such as system metrics and call stacks, logs are crucial for understanding both application performance and user behavior. Traditionally, these logs were used for aggregated operational analytics. This meant that the processing of these logs didn’t need to be real-time. However, as applications have grown in complexity, the need for real-time log processing has become more pressing. Use cases such as product search relevance, recommendation systems, and ad targeting often require the analysis of past user interactions in real time. Processing this data quickly is challenging, as it involves not only the items a user interacted with but also other items they did not engage with.
LinkedIn designed Kafka to address these challenges — providing a system capable of handling high volumes of log data in real-time with low latency.
Log aggregator and messaging queue hybrid
Kafka is designed to process large amounts of log data while exposing a simple API similar to traditional message queues. It draws inspiration from both traditional systems while avoiding elements that could introduce unnecessary overhead.
While traditional messaging systems, which Kafka’s API aims to emulate, have been around for a long time, they aren’t ideal for Kafka’s use case. One key limitation is the strong delivery guarantees these systems often provide, which can introduce significant overhead and slow down throughput. In Kafka’s use case, however, losing a few messages is generally acceptable, so it doesn’t prioritize these guarantees. For example, loosing a few user click events is not impactful to the application’s performance. Another issue with traditional message queues is that they’re typically optimized for scenarios where the queue of unconsumed messages is small. When messages accumulate, these systems struggle with scalability and performance, as they lack the necessary storage capacity. At the time Kafka was developed, queuing systems didn’t prioritize throughput or include features like batch message writing, which Kafka leveraged to improve performance.
On the other hand, traditional log aggregators usually rely on a push model, where log data is scraped from production servers and pushed to consumers. Kafka, by contrast, uses a pull model, allowing each consumer to fetch messages at their own pace. This helps avoid overwhelming consumers with large bursts of data and ensures more reliable processing. The pull model also enables Kafka’s “time travel” feature, allowing consumers to replay historical data as needed.
By combining the strengths of both message queues and log aggregators, Kafka acts as a scalable log broker and storage system, simplifying message delivery while maintaining high throughput. Its distributed architecture ensures that it can handle large volumes of data efficiently, making it a powerful tool for both real-time and historical data processing.
Architecture
In Kafka, a stream of messages is called a “topic.” A producer publishes messages to a topic, and these messages are stored on brokers. Consumers subscribe to one or more topics and pull messages from the brokers.
Kafka assumes messages are small and simple. Each message is just a byte sequence, serialized in any format preferred by the developer. Producers interact with Kafka’s API to send messages to a topic, while brokers store these messages. On the consumer’s end, each subscribed topic’s messages are consumed with an iterator that never terminates. If there are no messages to consume, the iterator blocks until new messages arrive.
Kafka is distributed by nature, so a Kafka cluster typically consists of multiple brokers. Topics are divided into multiple partitions, and each broker may store one or more partitions of a topic to balance the load and ensure scalability.
The following image shows how Kafka’s architecture is laid out. As described above, multiple producers and consumers can work with any number of topics at the same time:

Storage
Kafka represents each partition as a “logical log,” and each log is divided into segment files of roughly equal size. When a message is produced, it is appended to the end of a segment file. These segment files are only flushed to disk once a configurable number of messages have been added or after a certain time period has passed. A message is only made available to consumers after it has been flushed.
One of Kafka’s most innovative design choices is the lack of a unique ID for each message. This eliminates the need for complex mappings between IDs and data. Instead, Kafka uses offsets. Since messages are appended sequentially to a segment file, a consumer always consumes messages in order. When a consumer acknowledges a particular offset, it implicitly acknowledges all messages preceding it in the partition have been processed. Each pull request from the consumer includes the offset from which to begin reading and the number of bytes to fetch. Brokers store these offsets in memory, including the first offset of each segment file. Unfortunately, the paper doesn’t talk about how long these offsets are stored in memory.
After consuming a message, the consumer requests the next message by calculating the new offset based on the previous message. The following image shows the offsets stored in a brokers memory and its segment files:

Transfers
Under the hood, Kafka efficiently transfers multiple sets of messages in a single request. An interesting aspect of Kafka’s design is the lack of caching layers, which might seem counterintuitive. However, this is a practical choice given Kafka’s real-time processing use case. Since producers always append to the end of the file and consumers read sequentially, Kafka minimizes the need for caching by leveraging the operating system’s page cache.
To optimize further, Kafka uses the sendFile
API on Linux, which allows data to be transferred directly from the file channel to the socket channel. This avoids the overhead caused by read and write calls which typically involves the following:
- Read data from the kernel space and add it to page cache of the OS
- Copy the data in the page cache to an application buffer
- copy the application buffer to another kernel buffer
- send the kernel buffer to the socket
Since sockets in Linux provide you with a file descriptor in the memory space, sendFile
API avoids steps 2 and 3 and directly transfers data from the kernel space to the socket. Read more on the sendFile
API here.
Time travel
Thanks to the storage and offset mechanisms described earlier, Kafka allows consumers to “rewind” and re-consume data from an earlier offset. This feature is invaluable when reprocessing data after a failure. Rewinding the offset allows consumers to reprocess messages as if they had never been consumed. The pull model makes this process much simpler than the push model used by most other log processors, which generally don’t support such flexibility.
Distributed Coordination
Kafka introduces the concept of consumer groups — a collection of one or more consumers that jointly consume messages from a set of topics. Messages from a partition are delivered to only one consumer within a consumer group, ensuring that multiple consumers within a group don’t receive the same message.
Kafka defines partitions as the smallest unit of parallelism, meaning that each partition can only be consumed by one consumer at a time. This avoids the complexity of coordinating distributed consumers. Kafka topics can be over-partitioned to balance the load evenly across consumers.
For higher-level coordination, Kafka uses Zookeeper. When a new broker or consumer starts up, its is stored in a registry in Zookeeper. For a consumer, Zookeeper tracks the topics and partitions that it subscribes to and which consumer group it belongs to. For a consumer group, Zookeeper has an ownership and an offset registry. The ownership registry includes the path of every partition and the consumer consuming that partition. The offset registry stores the last consumed message of a partition. If a consumer suddenly goes down, a new one can contact Zookeeper to find the offset for the partition assigned to it and start processing messages.
Delivery Guarantees
When Kafka’s paper was written, it only guaranteed at least once delivery, leaving exactly once delivery to be handled by the client. However, since then, Kafka has been enhanced to include native support for exactly once semantics. Other updates on Kafka also include replication for messages across brokers. Partitions now have leader and follower brokers. All messages are sent to the leader and the followers consume messages from the leader like regular Kafka consumers to provide replication.
Kafka does not guarantee the ordering of messages across partitions, but it does guarantee that messages within a single partition are consumed in order.
Conclusion
Kafka’s design was a carefully created one, specifically for Linkedin’s real time log processing use case. The offset calculation that allows kafka to “go back in time” and other smart choices that help avoid a lot of the complicated distributed logic are very interesting. The use of smart caching by leveraging the OS page cache over an excessive caching layer also shows the focus on practicality despite it being an unconventional choice. Overall this paper was a fascinating read.
Kafka was specifically designed for LinkedIn’s real-time log processing needs. Its innovative use of offsets to allow “time travel,” the practical choice to leverage the OS page cache, and the decision to minimize unnecessary complexity make it a highly effective solution for distributed log management. Kafka’s ability to scale, handle high throughput, and process data with low latency has made it a key player in the world of real-time data processing. This paper was an interesting read of a system that revolutionized how we handle log data in real time.
Sources: