Irisidea TechSolutions Private Limited

Setting up Apache Druid analytics database to work with Apache Kafka

Today we are going to show you how to use Apache Druid with Apache Kafka. Apache Druid is a real-time analytics database designed for fast slice-and-dice analytics on large data sets. Most often, Druid powers use cases where real-time ingestion, fast query performance, and high up-time are important. So, What we have done is we have installed the latest version of Apache Druid, that is 26.0.0. Since Apache druid can not be installed in windows, we have installed it in Ubuntu where Kafka is already running in single node cluster. Apache druid has dependency on Zookeeper, hence it comes combined with zookeeper when you download and setup Apache Druid. Druid has its dashboard for streaming data visualization which can be used to see the data picked from any Kafka topic at real time. Apart from Kafka, Apache Druid has inbuilt connectors for other real time data streaming sources like amazon and Azure. We can also query the existing data present in Druid database and visualize it in its dashboard using query feature. For example if we are connected with a data stream related to weather, it will keep storing the data in the druid analytical database and using query, we can fetch the data for previous few hours to visualize the hourly changes in humidity or temperature.