Yearly Archives - 2023

Using Kafka to manage Large Messages

Architecture to leverage Apache Kafka for sharing large messages (GB size)

In today's data-driven world, the capability to transport and circulate large amounts of data, especially video files, in real-time is crucial for news media companies. For example, an incident occurred in a specific location, and a news reporter promptly filmed the entire situation. Subsequently, the complete video was distributed for broadcasting across their multiple studios situated in geographically distant locations. To construct or create a comprehensive solution for the given problem statement, we can utilize Apache Kafka in conjunction with...

Read more...
Zero Copy Principle

The Zero Copy Principle With Apache Kafka

The Apache Kafka, a distributed event streaming technology, can process trillions of events each day and eventually demonstrate its tremendous throughput and low latency. That’s building trust and over 80% of Fortune 100 businesses use and rely on Kafka. To develop high-performance data pipelines, streaming analytics, data integration, etc., thousands of companies presently use Kafka around the globe. By leveraging the zero-copy principle, Kafka improves efficiency in terms of data transfer. In short, when doing computer processes, the zero-copy...

Read more...
Druid Kafka Supervisor

Understanding Apache Druid Supervisor and its specification for real-time data ingestion from Apache Kafka

Although both Apache Druid and Apache Kafka are potent open-source data processing tools, they have diverse uses. While Druid is a high-performance, column-store, real-time analytical database, Kafka is a distributed platform for event streaming. However, they can work together in a typical data pipeline scenario where Kafka is used as a messaging system to ingest and store data/events, and Druid is used to perform real-time analytics on that data. In short, the indexing is the process of loading data in Druid...

Read more...
Causes and remedies of poison pill in Apache Kafka

Causes and remedies of poison pill in Apache Kafka

A poison pill is a message deliberately sent to a Kafka topic, designed to consistently fail when consumed, regardless of the number of consumption attempts. Poison Pill scenarios are frequently underestimated and can arise if not properly accounted for. Neglecting to address them can result in severe disruptions to the seamless operation of an event-driven system. The poison pill for various reasons: The failure of deserialization of the consumed bytes from the Kafka topic on the consumer side. Incompatible serializer and deserializer...

Read more...
How Kafka Works?

How Kafka Works?

#realtimedata streaming with #kafka is a popular and powerful approach to handle large volumes of data and facilitate communication between different systems and applications. #apachekafka is an open-source distributed event streaming platform that allows you to publish, subscribe, store, and process streams of records. Here's a general overview of how real-time data streaming with Kafka works: Topic and Message Model: Data is organized into topics, which are essentially log-like data streams. Each message within a topic consists of a key, value, and timestamp. Publishers...

Read more...

The significance of deep storage in Apache Druid

The phrase “deep storage” refers to the long-term storage system used by Apache Druid, where past data segments are preserved for durability and retrieval in the future. Druid stores data in files called segments and deep storage is the place where segments are stored. Even though Druid’s native integration with Apache Kafka (can read here how to integrate Druid with Kafka) and Amazon Kinesis, which allows query-on-arrival at millions of events per second, low latency ingestion, etc., and eventually enables us to...

Read more...
Forging Apache Druid with Apache Kafka for real-time streaming analytics

Forging Apache Druid with Apache Kafka for real-time streaming analytics

A real-time analytics database called Apache Druid is developed for quick slice-and-dice analysis on massive data volumes. The best data for Apache Druid is event-oriented and frequently utilized as the database backend for analytical application GUIs and for highly concurrent APIs that require quick aggregations. Druid can be leveraged very effectively where real-time ingestion, fast query performance, and high uptime are crucial. At the other end, Apache Kafka is gaining outstanding momentum as a distributed event streaming platform with excellent...

Read more...

Knowing and valuing Apache Kafka’s ISR (In-Sync Replicas)

To get more clarity about ISR in Apache Kafka, We should first carefully examine the replication process in the Kafka broker. In short, replication means having multiple copies of our data spread across multiple brokers. Maintaining the same copies of data in different brokers makes possible the high availability in case one or more brokers go down or are untraceable in a multi-node Kafka cluster to server the requests. Because of this reason, it is mandatory to mention how...

Read more...

Handling bad messages via DLQ by configuring JDBC Kafka Sink Connector

Any trustworthy data streaming pipeline needs to be able to identify and handle faults. Exceptionally while IoT devices ingest endlessly critical data/events into permanent persistence storage like RDBMS for future analysis via multi-node Apache Kafka cluster. (Please click here to read how to setup multi-node Apache Kafka Cluster). There could be scenarios where IoT devices might send fault/bad events due to various reasons at the source points and henceforth appropriate actions can be executed to correct it further. The Apache  Kafka...

Read more...

Streaming Data to RDBMS via Kafka JDBC Sink Connector without leveraging Schema Registry

In today’s M2M (Machine to machine) communications landscape, there is a huge requirement for streaming the digital data from heterogeneous IoT devices to the various RDBMS for further analysis via the dashboard, triggering different events to perform numerous actions. To support the above scenarios Apache Kafka acts like a central nervous system where data can be ingested from various IoT devices and persisted into various types of the repository, RDBMS, cloud storage, etc. Besides, various types of data pipelines...

Read more...