Gautam Goswami - Real-time Data Streaming, Real-time Data Processing, Real-time Data Analytics | Data Engineering Solution in Bangalore | Apache Kafka Streaming Solutions in Bangalore | Kafka Confluent Cloud Solutions in Bangalore | Kafka Streaming Implementation Support in Bangalore | Apache Kafka Support in Bangalore | Multinode Kafka Cluster Setup in Bangalore | Kafka Application Consulting in Bangalore | Kafka cloud implementation in Bangalore | Kafka infrastructure consulting in Bangalore | Kafka security implementation in Bangalore | Kafka upgrade support in Bangalore | Zookeeper setup support in Bangalore | Zookeeper Solutions in Bangalore | Multinode Zookeeper Setup in Bangalore | Big Data Consulting Service Providers in Bangalore | Data Analytics Consutling Services in Bangalore | Big Data Solution Providers in Bangalore | Big Data Analytics Companies in Bangalore | Data Analytic Services in Bangalore | Big Data Services in Bangalore | Big Data Analytics Solutions in Bangalore | Big Data Analytics Service Providers in Bangalore | Big Data Case Studies | Big Data Companies in Bangalore | Multi Node Hadoop Cluster | Data Lake creation and support | Data Ingestion Services in Bangalore | Koolanch | Artificial Intelliegence Solutions in Bangalore | Predictive Analysis Solution in Bangalore | Machine Learning Solution in Bangalore | Deep Learning Solutions Bangalore | ChatBots for Websites | Text to Speech API | DialogFlow ChatBots | ChatBots using DialogFlow | AI based image processing | AI solution providers in Bangalore | AI based Predictive Analytics | Conversational Bots Development in Bangalore | AI chatbots and voicebots | E-Commerce Solution Providers in Bangalore | Demandware Consulting Service in Bangalore | Demandware Companies in Bangalore | SFCC Consulting Service in Bangalore | SFCC Consulting Companies in Bangalore | SFCC Service Providers in Bangalore | Demandware Contract Staffing in Bangalore | Salesforce Commerce Cloud Consulting Services in Bangalore | SFCC Contract Staffing in Bangalore | Salesforce Commerce Cloud Contract Staffing in Bangalore | Oracle Consulting Services in Bangalore | Oracle Service Providers in Bangalore | Oracle Contract Staffing in Bangalore | OCC Contract Staffing in Bangalore | Oracle Commerce Cloud Consulting in Bangalore | Oracle Commerce Cloud Companies in Bangalore | SAP Hybris Consulting Services in Bangalore | SAP Hybris Service Providers in Bangalore | SAP Hybris Contract Staffing in Bangalore | SAP Hybris Commerce Cloud Consulting in Bangalore | SAP Hybris Companies in Bangalore | SAP Hybris Solutions in Bangalore | Hybris Commerce Solution in India | Hybris Solution Provider Companies | Magento Consulting Services in Bangalore | Magento Service Providers in Bangalore | Magento Contract Staffing in Bangalore | Magento Commerce Cloud Consulting in Bangalore | Magento Companies in Bangalore | Mobile App Development Company in Bangalore | Android App Development Services in Bangalore | Location Tracking Based Mobile App Development | Mobile App Development In Bangalore | Mobility Solution Provider in Bangalore | SQL Server Support Services in Bangalore | SQL Server Support Companies in Bangalore | Data Mining Solution in Bangalore | Custom App Development in Bangalore | Contract Staffing Solution in Bangalore

13OctOctober 13, 2023

Understanding Apache Druid Supervisor and its specification for real-time data ingestion from Apache Kafka

Although both Apache Druid and Apache Kafka are potent open-source data processing tools, they have diverse uses. While Druid is a high-performance, column-store, real-time analytical database, Kafka is a distributed platform for event streaming. However, they can work together in a typical data pipeline scenario where Kafka is used as a messaging system to ingest and store data/events, and Druid is used to perform real-time analytics on that data. In short, the indexing is the process of loading data in Druid...

25SepSeptember 25, 2023

Causes and remedies of poison pill in Apache Kafka

A poison pill is a message deliberately sent to a Kafka topic, designed to consistently fail when consumed, regardless of the number of consumption attempts. Poison Pill scenarios are frequently underestimated and can arise if not properly accounted for. Neglecting to address them can result in severe disruptions to the seamless operation of an event-driven system. The poison pill for various reasons: The failure of deserialization of the consumed bytes from the Kafka topic on the consumer side. Incompatible serializer and deserializer...

By Gautam GoswamiApache Kafka, Data Engineering, Data ScienceApache Kafka, Big Data, Data Engineering, dead letter queue, Kafka Poison Pill, message serialization in kafka, message validation in kafka, Poison Pill, timeouts and deadlines in kafka, versioning the kafka topicsComments Off

06JulJuly 6, 2023

The significance of deep storage in Apache Druid

The phrase “deep storage” refers to the long-term storage system used by Apache Druid, where past data segments are preserved for durability and retrieval in the future. Druid stores data in files called segments and deep storage is the place where segments are stored. Even though Druid’s native integration with Apache Kafka (can read here how to integrate Druid with Kafka) and Amazon Kinesis, which allows query-on-arrival at millions of events per second, low latency ingestion, etc., and eventually enables us to...

By Gautam GoswamiData Engineering, Data Scienceapache druid, Apache Kafka, ZookeeperComments Off

16JunJune 16, 2023

Forging Apache Druid with Apache Kafka for real-time streaming analytics

A real-time analytics database called Apache Druid is developed for quick slice-and-dice analysis on massive data volumes. The best data for Apache Druid is event-oriented and frequently utilized as the database backend for analytical application GUIs and for highly concurrent APIs that require quick aggregations. Druid can be leveraged very effectively where real-time ingestion, fast query performance, and high uptime are crucial. At the other end, Apache Kafka is gaining outstanding momentum as a distributed event streaming platform with excellent...

By Gautam GoswamiApache Kafka, Data Engineeringapache druid, Apache Kafka, Forging Apache Druid with Apache Kafka, Real time streaming with kafka and druid, using Apache Kafka with Apache druidComments Off

01JunJune 1, 2023

Knowing and valuing Apache Kafka’s ISR (In-Sync Replicas)

To get more clarity about ISR in Apache Kafka, We should first carefully examine the replication process in the Kafka broker. In short, replication means having multiple copies of our data spread across multiple brokers. Maintaining the same copies of data in different brokers makes possible the high availability in case one or more brokers go down or are untraceable in a multi-node Kafka cluster to server the requests. Because of this reason, it is mandatory to mention how...

By Gautam GoswamiApache Kafka, Data EngineeringApache Kafka, Apache Zookeeper, In-Sync Replica, Kafka Topics, multi node cluster, replicationComments Off

10AprApril 10, 2023

Handling bad messages via DLQ by configuring JDBC Kafka Sink Connector

Any trustworthy data streaming pipeline needs to be able to identify and handle faults. Exceptionally while IoT devices ingest endlessly critical data/events into permanent persistence storage like RDBMS for future analysis via multi-node Apache Kafka cluster. (Please click here to read how to setup multi-node Apache Kafka Cluster). There could be scenarios where IoT devices might send fault/bad events due to various reasons at the source points and henceforth appropriate actions can be executed to correct it further. The Apache Kafka...

By Gautam GoswamiApache Kafka, Data EngineeringApache Kafka, Big Data, JDBC Sink Connector, Kafka connect, schema registryComments Off

19FebFebruary 19, 2023

Streaming Data to RDBMS via Kafka JDBC Sink Connector without leveraging Schema Registry

In today’s M2M (Machine to machine) communications landscape, there is a huge requirement for streaming the digital data from heterogeneous IoT devices to the various RDBMS for further analysis via the dashboard, triggering different events to perform numerous actions. To support the above scenarios Apache Kafka acts like a central nervous system where data can be ingested from various IoT devices and persisted into various types of the repository, RDBMS, cloud storage, etc. Besides, various types of data pipelines...

By Gautam GoswamiApache Kafka, Data EngineeringApache Kafka, JDBC Sink Connector, Kafka connect, Kafka Topics, schema registryComments Off

09JanJanuary 9, 2023

Resolve Apache Kafka starting issue installed on Single/Multi-node cluster

This short article explains how to resolve the error “ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)kafka.common.InconsistentClusterIdException:” when we start the Apache Kafka installed and configured on a multi-node cluster. You can read here the steps for setting up multi-node Apache Kafka cluster. Without integrating Apache Zookeeper, Kafka alone won’t be able to form the complete Kafka cluster. Because ZooKeeper handles the leadership election of Kafka brokers and manages service discovery as well as cluster topology. Also tracks when topics are...

By Gautam GoswamiApache Kafka, Data Engineeringadding new broker into a kafka cluster, additional zookeeper server entry, Apache Kafka, Apache Kafka starting issue, Apache Zookeeper, configuration changes on Zookeeper, ERROR Fatal error during KafkaServer startup, Multi-node kafka cluster, Prepare to shutdown (kafka.server.KafkaServer)kafka.common.InconsistentClusterIdException, Resolving Apache Kafka starting issue, Resolving Apache Kafka starting issue on Multi Node Cluster, Resolving Apache Kafka starting issue on Single Node Cluster, single node Kafka broker, Updating Cluster ID in meta properties file, updating server.properties in kafkaComments Off

06JanJanuary 6, 2023

Few intrinsic of Apache Zookeeper and their importance

As a bird’s eye view, Apache Zookeeper has been leveraged to get coordination services for managing distributed applications. Holds responsibility for providing configuration information, naming, synchronization, and group services over large clusters in distributed systems. To consider as an example, Apache Kafka uses Zookeeper for choosing their leader node for the topic partitions. Please click here if you want read on how to setup the multi-node Apache Zookeeper cluster on Ubuntu/Linux zNodes The key concept of the Zookeeper is the znode which can be acted...

By Gautam GoswamiApache Hadoop, Apache Kafka, Architecture, Data Engineering, Hadoop Eco SystemApache Kafka, Apache Zookeeper, Big Data, Concept of zNodes, data log directory, dataDir parameter, dataLogDir parameter, distributed file systems, ephemeral zNodes, Hadoop, how to setup the multi-node Apache Zookeeper, Importance of Apache Zookeeper, persistence zNodes, receive notifications about changes to the ZooKeeper ensemble through watches, sequentialzNodes, Usage of Apache Zookeeper in Kafka Multi Node CLuster, Using Apache Zookeeper for managing distributed applications, ZooKeeper Data Directory, ZooKeeper ensemble, Zookeeper QuorumComments Off

16NovNovember 16, 2022

Overcome LEADER_NOT_AVAILABLE error on Multi-node Apache Kafka Cluster

Kafka Connect assumes a significant part in streaming data between Apache Kafka and other data systems. As a tool, it holds the responsibility of a scalable and reliable way to move the data in and out of Apache Kafka. Importing data from the Database set to Apache Kafka is surely perhaps the most well-known use instance of JDBC Connector (Source & Sink) that belongs to Kafka Connect. This short article aims to elaborate on the steps on how can we...

By Gautam GoswamiApache Kafka, Data Engineering, Multi Node Hadoop Cluster SetupApache Kafka, Importing data from the Database, JDBC Sink Connector, Kafka cluster network, Kafka connect, LEADER_NOT_AVAILABLE, message consumer, message producer, multi-broker Apache Kafka Cluster, Multi-node Apache Kafka Cluster, Multi-node kafka cluster, MySQL Database to Kafka topic, streaming dataComments Off

Author - Gautam Goswami

Understanding Apache Druid Supervisor and its specification for real-time data ingestion from Apache Kafka

Causes and remedies of poison pill in Apache Kafka

The significance of deep storage in Apache Druid

Forging Apache Druid with Apache Kafka for real-time streaming analytics

Knowing and valuing Apache Kafka’s ISR (In-Sync Replicas)

Handling bad messages via DLQ by configuring JDBC Kafka Sink Connector

Streaming Data to RDBMS via Kafka JDBC Sink Connector without leveraging Schema Registry

Resolve Apache Kafka starting issue installed on Single/Multi-node cluster

Few intrinsic of Apache Zookeeper and their importance

Overcome LEADER_NOT_AVAILABLE error on Multi-node Apache Kafka Cluster

ready to realize your digital transformation dreams?