Data Engineering - Real-time Data Streaming, Real-time Data Processing, Real-time Data Analytics | Data Engineering Solution in Bangalore | Apache Kafka Streaming Solutions in Bangalore | Kafka Confluent Cloud Solutions in Bangalore | Kafka Streaming Implementation Support in Bangalore | Apache Kafka Support in Bangalore | Multinode Kafka Cluster Setup in Bangalore | Kafka Application Consulting in Bangalore | Kafka cloud implementation in Bangalore | Kafka infrastructure consulting in Bangalore | Kafka security implementation in Bangalore | Kafka upgrade support in Bangalore | Zookeeper setup support in Bangalore | Zookeeper Solutions in Bangalore | Multinode Zookeeper Setup in Bangalore | Big Data Consulting Service Providers in Bangalore | Data Analytics Consutling Services in Bangalore | Big Data Solution Providers in Bangalore | Big Data Analytics Companies in Bangalore | Data Analytic Services in Bangalore | Big Data Services in Bangalore | Big Data Analytics Solutions in Bangalore | Big Data Analytics Service Providers in Bangalore | Big Data Case Studies | Big Data Companies in Bangalore | Multi Node Hadoop Cluster | Data Lake creation and support | Data Ingestion Services in Bangalore | Koolanch | Artificial Intelliegence Solutions in Bangalore | Predictive Analysis Solution in Bangalore | Machine Learning Solution in Bangalore | Deep Learning Solutions Bangalore | ChatBots for Websites | Text to Speech API | DialogFlow ChatBots | ChatBots using DialogFlow | AI based image processing | AI solution providers in Bangalore | AI based Predictive Analytics | Conversational Bots Development in Bangalore | AI chatbots and voicebots | E-Commerce Solution Providers in Bangalore | Demandware Consulting Service in Bangalore | Demandware Companies in Bangalore | SFCC Consulting Service in Bangalore | SFCC Consulting Companies in Bangalore | SFCC Service Providers in Bangalore | Demandware Contract Staffing in Bangalore | Salesforce Commerce Cloud Consulting Services in Bangalore | SFCC Contract Staffing in Bangalore | Salesforce Commerce Cloud Contract Staffing in Bangalore | Oracle Consulting Services in Bangalore | Oracle Service Providers in Bangalore | Oracle Contract Staffing in Bangalore | OCC Contract Staffing in Bangalore | Oracle Commerce Cloud Consulting in Bangalore | Oracle Commerce Cloud Companies in Bangalore | SAP Hybris Consulting Services in Bangalore | SAP Hybris Service Providers in Bangalore | SAP Hybris Contract Staffing in Bangalore | SAP Hybris Commerce Cloud Consulting in Bangalore | SAP Hybris Companies in Bangalore | SAP Hybris Solutions in Bangalore | Hybris Commerce Solution in India | Hybris Solution Provider Companies | Magento Consulting Services in Bangalore | Magento Service Providers in Bangalore | Magento Contract Staffing in Bangalore | Magento Commerce Cloud Consulting in Bangalore | Magento Companies in Bangalore | Mobile App Development Company in Bangalore | Android App Development Services in Bangalore | Location Tracking Based Mobile App Development | Mobile App Development In Bangalore | Mobility Solution Provider in Bangalore | SQL Server Support Services in Bangalore | SQL Server Support Companies in Bangalore | Data Mining Solution in Bangalore | Custom App Development in Bangalore | Contract Staffing Solution in Bangalore

19JanJanuary 19, 2021

iDropper – The Data Ingestion, Monitoring and Reporting Tool

In today’s complicated world of business, the data, organizations own and how they use it, make them different from others to innovate, to compete better and to stay ahead in the business. That’s the driving factor for the organizations to collect and process as much data as possible, transform it into information with data-driven discoveries, and deliver it to the end user in the right format for smart decision-making. Common Challenges/Concerns Fetching the raw data files from the various data...

30MarMarch 30, 2019

Error while batch processing of rest data persisted in Basic Hadoop based (HDFS) Data Lake “Permission denied: user=dr.who, access=READ_EXECUTE, inode=”/tmp”:hdadmin:supergroup:drwx……..”

Typically, persisting unstructured data and subsequent batch processing can be very costly and is not advisable for small organizations & startups, as cost is prime factor for them. A Hadoop based Data Lake using Map-Reduce, fits perfectly in this scenario which is not only cost effective but also scalable and easy to extend further. Though it may sound a great option to have, we might face issues while setting up the same and one of common issues is, error "Permission...

By Kislay KomalData Engineering, Multi Node Hadoop Cluster Setup"Permission denied: user=dr.who, access=READ_EXECUTE, Error while batch processing of rest data persisted, Error while batch processing of rest data persisted in Basic Hadoop based (HDFS) Data Lake "Permission denied: user=dr.who, Error while batch processing of rest data persisted in Basic Hadoop based Data Lake, Error while batch processing of rest data persisted in Basic HDFS, Error while batch processing of rest data persisted in HDFS, inode="/tmp":hdadmin:supergroup:drwx........"Comments Off

27MarMarch 27, 2019

Resolved – ” Incompatible clusterIds in… ” in Multi Node Hadoop Cluster Setup

Currently, there are many startups / small companies and their customers, working on Data Analytics, ML, AI and related solutions. Due to their budget constraints, some of them don't want to leverage Cloud-based storage. Alternatively, to process ingested data, they create basic Data Lake using HDFS. During this process, they might encounter the exception of "org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /home/....". while starting the Name Node or Master Node in a multi-node Hadoop Cluster. This may occur in the following scenarios: ...

By Gautam GoswamiApache Hadoop, Data Engineering, Multi Node Hadoop Cluster Setup/home/hadoop/data/dataNode, > hdfs namenode -format, adding a new Data Node, adding a new Data Node to the cluster, Adding different operational node to cluster, Adding different operational node to cluster based on increasing data volume, clusterID, dfs.datanode.data.dir, dfs.namenode.name.dir, format the Name node, formatting the Name node, FsImage, FsImage file, HADOOP Cluster, Hadoop Cluster Setup, hdfs-site.xml, Incompatible clusterIds in, Multi Node Hadoop Cluster, Multi Node Hadoop Cluster Setup, Name Node and Master Node, org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException: Incompatible clusterIDs in /home/, Resolved Incompatible clusterIds in, setting up HDFS data lake for organizations, setting up HDFS data lake for startups, setting up your own HDFS data lake, Starting Master Node, Starting Name NodeComments Off

02MarMarch 2, 2019

Network Topology To Create Multi Node Hybrid Cluster For Hadoop Installation

The aim of this article is to provide an outline for creating network topology for Hadoop installation in multi node hybrid cluster with limited available hardware resources. This cluster would be beneficial for learning Hadoop, with lower volume of unstructured data processing using various engines etc. Before the cluster setup: We installed Hadoop on a single node cluster running on Ubuntu 14.04 on top of Windows 10 using VMware workstation player. Later we have copied the .vmx file into multiple...

By Gautam GoswamiApache Hadoop, Data Engineering.vmx file, Assign static IP to Ubuntu VMware workstation players, Choosing a network switch over router, Configuration of data nodes with name node, Configure Internet Protocol Version 4 (TCP/IPv4) on Windows, Create Multi Node Hybrid Cluster, Create Multi Node Hybrid Cluster For Hadoop Installation, creating network topology for Hadoop installation, D-Link DES-1005C 10/100 Network Switch, Dell Inspiron 1525 Laptop with 2 GB RAM and Ubuntu 14.04 as operating system, Dell Inspiron 5458 Laptop with 16 GB RAM and Windows 10 as host operating system, Desktop with 8 GB RAM and Windows 7 Professional as host operating system, Ethernet LAN setup using Network Switch, guest OS, Hadoop, Hadoop Installation, Lenovo B40-80 Laptop with 4 GB RAM and Windows 10 as host operating system, Multi Node Hybrid Cluster, Multi Node Hybrid Cluster For Hadoop Installation, Network Adapter setting on the VM Player, Network Topology For Hadoop Installation, Network Topology To Create Multi Node Hybrid Cluster, Network Topology To Create Multi Node Hybrid Cluster For Hadoop Installation, OS Ubuntu 14.04, PC to Router, PC to Switch, Router to Switch, single node cluster running on Ubuntu, Straight-through cables, Ubuntu on top of Windows, Ubuntu on top of Windows 10 using VMware workstation player, VMware Workstation 12 player, VMware Workstation 7.x player, Windows using VMware workstation playerComments Off

25JunJune 25, 2018

Data Governance & Security Mechanism in Distributed Data Storage System

We are aware that the traditional data storage mechanism is incapable to hold the massive volume of data generated with lightning speed for further utilization even if we perform vertical scaling, and we have anticipated only one fuel, nothing but DATA to accelerate the movement across all the sectors starting from business to natural resources including medical towards rapid growth. But the question is how to persist this massive volume of data for processing? The answer is, storing the data...

By Gautam GoswamiApache Hadoop, Data Engineering, Storage MechanismApache Knox gateway, Data Governance, Data Governance & Security Mechanism in Distributed Data Storage System, Data Security Mechanism, Data Security Mechanism in Distributed Data Storage System, Data Storage System, Distributed Data Storage System, HDFS, HDFS cluster, Integration with LDAP, Kerberos, Kerberos for authentication, managing data accessComments Off

02JunJune 2, 2018

Processing and Analysis of Big Telecom Data to minimize crime, combat terrorism, unsocial activities etc.

Telecom providers have a treasure trove of captive data - customer data, CDR (call detail records), call center interactions, tower logs etc. and are metaphorically “sitting on a gold mine”. Ideally, each category of the generated data has the following information. ⦁ Customer data consolidates customer id, plan details, demographic, subscribed services and spending patterns ⦁ Service data category consolidates types of customer, customer history, complain category, query resolved etc. are on ⦁ Usually for the smart mobile phone subscriber,...

By Kislay KomalApache Hadoop, Architecture, Data Analysis, Data EngineeringBig Data Processing and Analysis, Processing and Analysis of Big Telecom Data, Processing and Analysis of Telecom Data, Telecom Data, Telecom Data Processing and AnalysisComments Off

21JanJanuary 21, 2018

Deleting Solr log files/folder from Standby NameNode could be the disaster when Primary NameNode is active in the HDP (Hortonworks Data Platform) Hadoop Cluster

Most of us know that we use Apache Ambari for managing, provisioning and monitor different components of a Hortonworks Hadoop cluster. We also know that Apache Ranger can be used as a centralized security administration solution for Hadoop that enables administrators to create and enforce security policies for HDFS and other Hadoop platform components. When ranger hdfs plugin is enabled ,it writes the client interaction activity to Solr if it is configured. The default location of this solr log files...

By Kislay KomalApache Hadoop, Data EngineeringComments Off

12DecDecember 12, 2017

Fault Tolerance Enhancement On Apache Hadoop 3.0.0-alpha2 For Supporting More Than 2 NameNodes

NameNode is the most critical resource in Hadoop core cluster. Once very large files loaded into the Hadoop Distributed File System (HDFS), the files get broken into block-sized chunks as per the parameter configured (64 MB by default). The chunks are then stored as independent units across the data nodes in the cluster. The primary responsibility of the data nodes is to hold the actual data in the form of chunk and NameNode holds the information where all the chunks located/stored in the...

By Kislay KomalApache Hadoopactive NameNode, Apache Hadoop, Apache Hadoop 3.0.0, Apache Hadoop 3.0.0-alpha2, configuration of NameNodes, data nodes, data nodes location, Fault Tolerance Enhancement On Apache Hadoop, filesystem metadata, filesystem namespace, filesystem tree, FsImage, FsImage file, Hadoop 2.0.0, Hadoop 3.0.0, Hadoop core cluster, Hadoop Distributed File System, Hadoop framework, HDFS, HDFS cluster, HDFS NameNode, JournalNodes, Master Node, NameNode, NameNode holds the information, new features of Apache Hadoop 3.0.0, new features of Hadoop 3.0.0, Primary NameNode, Quorum-based Storage, responsibility of the data nodes, secondary NameNode, single point of failure, SPOF, standby NameNode, Supporting More Than 2 NameNodes, very large amount of enterprise dataComments Off

30NovNovember 30, 2017

Basic Understanding Of Stateful Data Streaming Supported By Apache Flink

Technologies related to Big Data processing platform are enhancing the maturity in order to efficiently execute the streaming data which is becoming a major focus point to take business decision instantly specially in telecom and retail sector. Collecting data continuously from the various sensors installed/fitted with an industrial heavy equipment, click stream on an e-commerce application’s navigation etc can be considered as streaming data generation sources. By leveraging streaming application, we can process/analyze these continues flow of data without...

By Kislay KomalData Engineering, Processing EngineApache Flink, Big Data processing platform, checkpoint, concurrent updates, Data Streaming, Flink, HDFS, KeyBy, maintain parallelism, protecting streaming application from failure, savepoint., Stateful Data Streaming, Stateful Data Streaming Supported By Apache Flink, stateful operation, Stateless computation, Statelful computation, streaming application, Understanding Of Stateful Data Streaming Supported By Apache FlinkComments Off

25SepSeptember 25, 2017

Apache Flink – A 4G Data Processing Engine

Analyzing streaming data in large-scale systems is becoming a focal point day by day to take accurate business decisions due to mushrooming of digital data generation sources around the globe including social media. Real-Time analytics are becoming more attractive due to possibilities of getting insights from the time-value of data (in other words, when data is in motion). Apache Flink, an open source highly innovative stream processor engine has been grounded which helps to take advantage of stream-based approaches. Besides...

By Gautam GoswamiData Engineering, Processing Engine4G Data, 4G Data Processing, 4G Data Processing Engine, Amazon EC2, analyze historical data, Apache, Apache Flink, Big Data, core computational belt, data pipeline, Data Processing Engine, Data stream source, dataArtisans, DataSet API, DataStream API, Delta Iterate, DGI, Directed Acyclic Graph, disk spilling, fault-tolerant, flatMap, Flink, Flink Runtime, Garbage Collector, Google cloud, GroupReduc, Hadoop, hashing and sorting, Iterate, Java Virtual Machine, JVM, JVM memory management system, map, memory management system, micro-batching, multi-node clusters, processor engine, real-time analytics, Savepoints, single JVM, Single node clusters, storage mechanism, stream processor, stream processor engine, Twitter streaming, Yahoo, YARN, Yet Another Resource NegotiatorComments Off