Apache Kafka is the distributed messaging system which serves as substitute for traditional JMS messaging systems in the world of BIG-DATA . Another way to describe Kafka as per Apache website is “Apache Kafka is publish-subscribe messaging rethought as a distributed commit log”. It was originally developed by LinkedIn and and later on became a part of Apache project.
Features of Kafka are
- Durable and
- Distributed by Design.
Kafka has some differences when compared to other message brokers like RabittMQ, Websphere MQ .One of the best known advantage is its performance. It consumes data on its own way.
Components of Apache Kafka
There are five important components in Kafka as given below
Kafka can have multiple producers and consumers and work as a cluster in distributed model.
Kafka Cluster diagram
Kafka works on a publisher-consumer mechanism. Kafka maintains feeds of messages in “topics“, processes which publish messages to a Kafka topic is named as “Producers“, processes that subscribe to the topic and process the published messages is called as “Consumers“. Kafka is run as a cluster comprised of one or more servers each of which is called as broker.
Kafka Use cases
- Messaging : Replacement for a more traditional message broker, Kafka has better throughput, built-in partitioning, replication, and fault-tolerance.
- Website Activity Tracking : Real-time processing, real-time monitoring, and loading into Hadoop or offline data warehousing systems for offline processing and reporting.
- Metrics : Aggregating statistics from distributed applications to produce centralized feeds of operational data.
- Log Aggregation : Collects physical log files off servers and puts them in a central place (a file server or HDFS perhaps) for processing
This is just an introduction about kafka, please refer Apache-site documentation for detailed documentation.