Apache Kafka

04JunJune 4, 2025

Driving Streaming Intelligence On-Premises: Real-Time ML with Apache Kafka and Flink

Lately, companies, in their efforts to engage in real-time decision-making by exploiting big data, have been inclined to find a suitable architecture for this data as quickly as possible. With many companies, including SaaS users, choosing to deploy their own infrastructures entirely on their own, the combination of Apache Flink and Kafka offers low-latency data pipelines that are built for complete reliability. Particularly due to the financial and technical constraints it brings, small and the medium size enterprises often have...

By Gautam GoswamiApache Kafka, Artificial Intelligence, Big Data, Data Engineering, Machine LearningComments Off

11AprApril 11, 2024

Transferring real-time data stream processed by Apache Flink to Kafka to Druid for analysis

Businesses can react quickly and effectively to user behavior patterns by using real-time analytics. This allows them to take advantage of opportunities that might otherwise pass them by and prevent problems from getting worse. Apache Kafka, a popular event streaming platform, can be used for real-time ingestion of data/events generated from various sources across multiple verticals such as IoT, financial transactions, inventory, etc. This data can then be streamed into multiple downstream applications or engines for further processing and eventual...

By Gautam GoswamiApache Druid, Apache Kafka, Big Data, Data Ingestion, Data Scienceapache druid, Apache Flink, Apache Kafka, Kafka and Druid, kafka flink and druid, multi-broker kafka cluster, real time analytics with apache druid, real time stream processing with apache flink, real time streaming with kafka, Real-time data analytics, real-time data analytics. with Apache Flink, sending streaming data from flink to kafkaComments Off

13FebFebruary 13, 2024

Why Apache Kafka and Apache Flink work incredibly well together to boost real-time data analytics

When data is analyzed and processed in real-time, it can yield insights and actionable information either instantly or with very little delay from the time the data is collected. The capacity to collect, handle, and retain user-generated data in real-time is crucial for many applications in today’s data-driven environment. There are various ways to emphasize the significance of real-time data analytics like timely decision-making, IoT and sensor data processing, enhanced customer experience, proactive problem resolution, fraud detection and security,...

By Gautam GoswamiApache Kafka, Data Engineering, Processing EngineApache Flink, Apache Kafka, Big Data, real time analytics, real-time data streamingComments Off

23JanJanuary 23, 2024

Integrating rate-limiting and backpressure strategies synergistically to handle and alleviate consumer lag in Apache Kafka

Apache Kafka stands as a robust distributed streaming platform. However, like any system, it is imperative to proficiently oversee and control latency for optimal performance. Kafka Consumer Lag refers to the variance between the most recent message within a Kafka topic and the message that has been processed by a consumer. This lag may arise when the consumer struggles to match the pace at which new messages are generated and appended to the topic. Consumer lag in Kafka may...

By Gautam GoswamiApache Kafka, Big Data, Data EngineeringApache Kafka, Data Streaming, kafka consumerComments Off

19DecDecember 19, 2023

Architecture to leverage Apache Kafka for sharing large messages (GB size)

In today's data-driven world, the capability to transport and circulate large amounts of data, especially video files, in real-time is crucial for news media companies. For example, an incident occurred in a specific location, and a news reporter promptly filmed the entire situation. Subsequently, the complete video was distributed for broadcasting across their multiple studios situated in geographically distant locations. To construct or create a comprehensive solution for the given problem statement, we can utilize Apache Kafka in conjunction with...

By Gautam GoswamiApache Kafka, Data Engineering, Data ScienceApache Kafka, Big Data, HDFSComments Off

20NovNovember 20, 2023

The Zero Copy Principle With Apache Kafka

The Apache Kafka, a distributed event streaming technology, can process trillions of events each day and eventually demonstrate its tremendous throughput and low latency. That’s building trust and over 80% of Fortune 100 businesses use and rely on Kafka. To develop high-performance data pipelines, streaming analytics, data integration, etc., thousands of companies presently use Kafka around the globe. By leveraging the zero-copy principle, Kafka improves efficiency in terms of data transfer. In short, when doing computer processes, the zero-copy...

By Gautam GoswamiApache Kafka, Data ScienceApache Kafka, Big Data, event streaming platformComments Off

13OctOctober 13, 2023

Understanding Apache Druid Supervisor and its specification for real-time data ingestion from Apache Kafka

Although both Apache Druid and Apache Kafka are potent open-source data processing tools, they have diverse uses. While Druid is a high-performance, column-store, real-time analytical database, Kafka is a distributed platform for event streaming. However, they can work together in a typical data pipeline scenario where Kafka is used as a messaging system to ingest and store data/events, and Druid is used to perform real-time analytics on that data. In short, the indexing is the process of loading data in Druid...

By Gautam GoswamiApache Druid, Apache Kafka, Big Data, Data Engineering, Data Ingestionanalyze the data in real-time, apache druid, Apache Kafka, Apache Kafka Indexing Service, Apache Kafka supervisor, distributed platform for event streaming, druid supervisors, How to accept data streams from Apache Kafka, integration between Apache Kafka and Druid for real-time data intake and analytics, open-source data processing tools, Real time data ingestion from Apache Kafka, real-time analytical database, Supervisor and its specification in Apache Druid, Supervisor of Apache Druid, The data ingestion lifecycle, Understanding Apache Druid Supervisor and its specification for real-time data ingestion from Apache Kafka, Understanding Supervisor in DruidComments Off

25SepSeptember 25, 2023

Causes and remedies of poison pill in Apache Kafka

A poison pill is a message deliberately sent to a Kafka topic, designed to consistently fail when consumed, regardless of the number of consumption attempts. Poison Pill scenarios are frequently underestimated and can arise if not properly accounted for. Neglecting to address them can result in severe disruptions to the seamless operation of an event-driven system. The poison pill for various reasons: The failure of deserialization of the consumed bytes from the Kafka topic on the consumer side. Incompatible serializer and deserializer...

By Gautam GoswamiApache Kafka, Data Engineering, Data ScienceApache Kafka, Big Data, Data Engineering, dead letter queue, Kafka Poison Pill, message serialization in kafka, message validation in kafka, Poison Pill, timeouts and deadlines in kafka, versioning the kafka topicsComments Off

02AugAugust 2, 2023

How Kafka Works?

#realtimedata streaming with #kafka is a popular and powerful approach to handle large volumes of data and facilitate communication between different systems and applications. #apachekafka is an open-source distributed event streaming platform that allows you to publish, subscribe, store, and process streams of records. Here's a general overview of how real-time data streaming with Kafka works: Topic and Message Model: Data is organized into topics, which are essentially log-like data streams. Each message within a topic consists of a key, value, and timestamp. Publishers...

By Kislay KomalApache Kafka, Data EngineeringApache Kafka, How kafka works?, Kafka, kafka. working with apache kafkaComments Off

16JunJune 16, 2023

Forging Apache Druid with Apache Kafka for real-time streaming analytics

A real-time analytics database called Apache Druid is developed for quick slice-and-dice analysis on massive data volumes. The best data for Apache Druid is event-oriented and frequently utilized as the database backend for analytical application GUIs and for highly concurrent APIs that require quick aggregations. Druid can be leveraged very effectively where real-time ingestion, fast query performance, and high uptime are crucial. At the other end, Apache Kafka is gaining outstanding momentum as a distributed event streaming platform with excellent...

By Gautam GoswamiApache Kafka, Data Engineeringapache druid, Apache Kafka, Forging Apache Druid with Apache Kafka, Real time streaming with kafka and druid, using Apache Kafka with Apache druidComments Off

May I Know Your Details?