Apache Kafka is the distributed messaging system which serves as a substitute for traditional JMS messaging systems in the world of BIG-DATA. Another way to describe Kafka as per Apache website is “Apache Kafka is publish-subscribe messaging rethought as a distributed commit log”. It was originally developed by LinkedIn and later on became a part of the Apache project.
Features of Kafka are
- Faster
- Scalable
- Durable and
- Distributed by Design.
Kafka has some differences when compared to other message brokers like RabittMQ, Websphere MQ . One of the best-known advantages is its performance. It consumes data in its own way.
Components of Apache Kafka
There are five important components in Kafka as given below
![](https://www.technix.in/wp-content/uploads/2015/11/KAFKAComponents.jpg)
Kafka COMPONENTS
Kafka can have multiple producers and consumers and work as a cluster in a distributed model.
![](https://www.technix.in/wp-content/uploads/2015/11/KAFKACluster.jpg)
Kafka works on a publisher-consumer mechanism. Kafka maintains feeds of messages in “topics“, processes which publish messages to a Kafka topic is named as “Producers“, processes that subscribe to the topic and process the published messages is called as “Consumers“. Kafka is run as a cluster comprised of one or more servers each of which is called as a broker.
Kafka Use cases
- Messaging: Replacement for a more traditional message broker, Kafka has better throughput, built-in partitioning, replication, and fault-tolerance.
- Website Activity Tracking: Real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.
- Metrics: Aggregating statistics from distributed applications to produce centralized feeds of operational data.
- Log Aggregation: Collects physical log files of servers and puts them in a central place (a file server or HDFS perhaps) for processing
This is just an introduction about Kafka, please refer Apache-site documentation for detailed documentation.