Hadoop Distributed File System - Real-time Data Streaming, Real-time Data Processing, Real-time Data Analytics | Data Engineering Solution in Bangalore | Apache Kafka Streaming Solutions in Bangalore | Kafka Confluent Cloud Solutions in Bangalore | Kafka Streaming Implementation Support in Bangalore | Apache Kafka Support in Bangalore | Multinode Kafka Cluster Setup in Bangalore | Kafka Application Consulting in Bangalore | Kafka cloud implementation in Bangalore | Kafka infrastructure consulting in Bangalore | Kafka security implementation in Bangalore | Kafka upgrade support in Bangalore | Zookeeper setup support in Bangalore | Zookeeper Solutions in Bangalore | Multinode Zookeeper Setup in Bangalore | Big Data Consulting Service Providers in Bangalore | Data Analytics Consutling Services in Bangalore | Big Data Solution Providers in Bangalore | Big Data Analytics Companies in Bangalore | Data Analytic Services in Bangalore | Big Data Services in Bangalore | Big Data Analytics Solutions in Bangalore | Big Data Analytics Service Providers in Bangalore | Big Data Case Studies | Big Data Companies in Bangalore | Multi Node Hadoop Cluster | Data Lake creation and support | Data Ingestion Services in Bangalore | Koolanch | Artificial Intelliegence Solutions in Bangalore | Predictive Analysis Solution in Bangalore | Machine Learning Solution in Bangalore | Deep Learning Solutions Bangalore | ChatBots for Websites | Text to Speech API | DialogFlow ChatBots | ChatBots using DialogFlow | AI based image processing | AI solution providers in Bangalore | AI based Predictive Analytics | Conversational Bots Development in Bangalore | AI chatbots and voicebots | E-Commerce Solution Providers in Bangalore | Demandware Consulting Service in Bangalore | Demandware Companies in Bangalore | SFCC Consulting Service in Bangalore | SFCC Consulting Companies in Bangalore | SFCC Service Providers in Bangalore | Demandware Contract Staffing in Bangalore | Salesforce Commerce Cloud Consulting Services in Bangalore | SFCC Contract Staffing in Bangalore | Salesforce Commerce Cloud Contract Staffing in Bangalore | Oracle Consulting Services in Bangalore | Oracle Service Providers in Bangalore | Oracle Contract Staffing in Bangalore | OCC Contract Staffing in Bangalore | Oracle Commerce Cloud Consulting in Bangalore | Oracle Commerce Cloud Companies in Bangalore | SAP Hybris Consulting Services in Bangalore | SAP Hybris Service Providers in Bangalore | SAP Hybris Contract Staffing in Bangalore | SAP Hybris Commerce Cloud Consulting in Bangalore | SAP Hybris Companies in Bangalore | SAP Hybris Solutions in Bangalore | Hybris Commerce Solution in India | Hybris Solution Provider Companies | Magento Consulting Services in Bangalore | Magento Service Providers in Bangalore | Magento Contract Staffing in Bangalore | Magento Commerce Cloud Consulting in Bangalore | Magento Companies in Bangalore | Mobile App Development Company in Bangalore | Android App Development Services in Bangalore | Location Tracking Based Mobile App Development | Mobile App Development In Bangalore | Mobility Solution Provider in Bangalore | SQL Server Support Services in Bangalore | SQL Server Support Companies in Bangalore | Data Mining Solution in Bangalore | Custom App Development in Bangalore

11JanJanuary 11, 2021

Distributed Incubator

By Kislay KomalApache Hadoop, AWS Cloud, Big Data Analysis service in Bangalore, construct the Data Lake, customer inclination, customer sentiments, Data Absorption Service in Bangalore, Data ingestion, Data Ingestion Service in Bangalore, Data Lake, Data Transformation, data visibility, Distributed Data Incubator, distributed data storage, Distributed Incubator, ELT concept, extracting heterogeneous data, fault tolerance, Hadoop Data Lake, Hadoop Data Lake Creation, Hadoop Distributed File System, persisting data, replication, traditional databases, unbounded storage capacityComments Off

12DecDecember 12, 2017

Fault Tolerance Enhancement On Apache Hadoop 3.0.0-alpha2 For Supporting More Than 2 NameNodes

NameNode is the most critical resource in Hadoop core cluster. Once very large files loaded into the Hadoop Distributed File System (HDFS), the files get broken into block-sized chunks as per the parameter configured (64 MB by default). The chunks are then stored as independent units across the data nodes in the cluster. The primary responsibility of the data nodes is to hold the actual data in the form of chunk and NameNode holds the information where all the chunks located/stored in the...

By Kislay KomalApache Hadoopactive NameNode, Apache Hadoop, Apache Hadoop 3.0.0, Apache Hadoop 3.0.0-alpha2, configuration of NameNodes, data nodes, data nodes location, Fault Tolerance Enhancement On Apache Hadoop, filesystem metadata, filesystem namespace, filesystem tree, FsImage, FsImage file, Hadoop 2.0.0, Hadoop 3.0.0, Hadoop core cluster, Hadoop Distributed File System, Hadoop framework, HDFS, HDFS cluster, HDFS NameNode, JournalNodes, Master Node, NameNode, NameNode holds the information, new features of Apache Hadoop 3.0.0, new features of Hadoop 3.0.0, Primary NameNode, Quorum-based Storage, responsibility of the data nodes, secondary NameNode, single point of failure, SPOF, standby NameNode, Supporting More Than 2 NameNodes, very large amount of enterprise dataComments Off

17SepSeptember 17, 2017

Steering number of mapper (MapReduce) in sqoop for parallelism of data ingestion into Hadoop Distributed File System (HDFS)

To import data from most the data source like RDBMS, sqoop internally use mapper. Before delegating the responsibility to the mapper, sqoop performs few initial operations in a sequence once we execute the command on a terminal in any node in the Hadoop cluster. Ideally, in production environment, sqoop installed in the separate node and updated .bashrc file to append sqoop's binary and configuration which helps to execute sqoop command from anywhere in the multi-node cluster. Most of the...

By Gautam GoswamiData Engineering, Data IngestionData ingestion, Hadoop Distributed File System, HDFS, Map Reduce, parallelism of data ingestion, Sqoop, sqoop for parallelism of data ingestion into Hadoop Distributed File System (HDFS)Comments Off

08SepSeptember 8, 2017

Transfer structured data from Oracle to Hadoop storage system

Using Apache's sqoop, we can transfer structured data from Relational Database Management System to Hadoop distributed file system (HDFS). Because of distributed storage mechanism in Hadoop Distributed File System (HDFS), we can store any format of data in huge volume in terms of capacity. In RDBMS, data persists in the row and column format (Known as Structured Data). In order to process the huge volume of enterprise data, we can leverage HDFS as a basic data lake. In this...

By Gautam GoswamiData Engineering, Hadoop Eco SystemAmazon web service, Apache Sqoop, Apache's sqoop, Data ingestion, Data ingestion mechanism, distributed storage, distributed storage mechanism, enterprise data, Google cloud, Hadoop, Hadoop 2.x, Hadoop Distributed File System, Hadoop storage system, HDFS, huge volume of enterprise data, Microsoft Azure, multi node cluster, Oracle to Hadoop, Sqoop, structured data, Transfer structured data from Oracle to Hadoop, Using Apache sqoopComments Off

08JunJune 8, 2017

Why Lambda Architecture in Big Data Processing

Due to the exponential growth of digitization, the entire globe is creating minimum 2.5 Quintilian 2500000000000 Million) bytes of data every day and that we can denote as Big Data. Data generation is happening from everywhere starting from social media sites, various sensors, satellite, purchase transaction, Mobile, GPS signals and much more. With the advancement of technology, there is no sign of slowing down of data generation, instead it will grow in massive volume. All the major organizations, retailers,...

By Gautam GoswamiArchitecture, Data EngineeringApache Kafka, Apache Spark, Batch Data-processing Pipeline, batch layer, Big Data, Big Data Processing, big data technologies, Data ingestion, data pipeline, Data Pipelines, data processing pipeline, data warehousing systems, design framework, Digitization, fault tolerance, Flume Lambda sign λ, GPS signals, Hadoop Distributed File System, HDFS, Lambda, Lambda Architecture, Lambda Architecture in Big Data Processing, Lambda Architecture is a pluggable architecture, leveraging big data technologies, live streaming data, Mobile, Nathan Marz, persistence of data, pluggable architecture, purchase transaction, Quintilian bytes of data, satellite, sensors, Serving Layer, Speed layer, Streaming Data Pipeline, streaming data processing pipeline, streaming layer, Streaming or Speed layerComments Off

29MayMay 29, 2017

Apache Kafka, The next Generation Distributed Messaging System

In Big Data project, the main challenge is to collect an enormous volume of data. We need distributed high throughput messaging systems to overcome it. Apache Kafka is designed to address the challenge. It was originally developed at LinkedIn Corporation and later on became a part of Apache project. A Messaging System is typically responsible for transferring data from one application to another. A message is nothing but the bunch of data/information. To ingest huge volume of data into Hadoop...

By Gautam GoswamiData IngestionApache Kafka, Apache project, Big Data project, collect an enormous volume of data, distributed high throughput messaging systems, distributed messaging systems, ETL, Extraction, Hadoop Distributed File System, HDFS, high throughput, Kafka supports multi-subscribers, LinkedIn Corporation, Messaging System, multi-subscribers, next Generation Distributed Messaging System, transferring data from one application to another, Transformation and LoadingComments Off

19MayMay 19, 2017

Research Papers & Publications

1. Effective image analysis on twitter streaming using Hadoop Eco System on Amazon Web Service EC2 (Download) We have published a research paper on Hadoop and Ecosystem using real-time case study, in “International Journal of Advanced Research in Computer Science and Software Engineering” ISSN:2277 128X Title: Effective Image Analysis on Twitter Streaming using Hadoop Eco System on Amazon Web Service EC2 Paper ID: V5I9-0359 URL: Effective image analysis on twitter streaming using Hadoop Eco System on Amazon Web Service EC2 Abstract: Twitter is becoming the...

05AprApril 5, 2017

Basic concept of Data Lake

The left side info graphics represents the basic concept of Data Lake where we can use the approach of ELT (Extraction, loading and then transformation) against traditional ETL (Extraction, Transformation and then loading) process. ETL process implies to traditional data warehousing system where structured data format follows (row and column). By leveraging HDFS (Hadoop Distributed File System), we can develop data lake to store any format data in order to process and analysis. Directly data can be loaded in the Lake...

By Gautam GoswamiData Engineering, Storage Mechanismbasic concept of Data Lake, concept of Data Lake, Data Lake, Data Transformation, data warehouse, data warehousing, data warehousing system, ELT, ETL, Extraction loading and transformation, Extraction Transformation and loading, fault-tolerant, Hadoop Data Lake, Hadoop Distributed File System, HDFS, how to create hadoop data lake, Semi Structured data, structured data, Traditional data warehouse, unstructured dataComments Off

14FebFebruary 14, 2017

Semi-Structured Data

Semi-structured data lies between structured and unstructured data. Data that get stored in the traditional database system or excel sheet can be denoted as structured data and organized in COLUMNS and ROWS. Unstructured data can be considered as any data or piece of information which can't be stored in Databases/RDBMS etc. Email, Facebook comments, news paper etc. are the examples of unstructured data. Semi-structured data do not follow strict data model structure and neither raw data nor typed data in...

By Gautam GoswamiApache Hadoop, Data EngineeringAnalysing unstructured Data, bi-directional data interchange, Cassandra, client-server web application, Email, excel sheet, Facebook comments, Facebook graph, Facebook graph API, GET method, GET method in REST service, graph API, Hadoop Distributed File System, HBase, HDFS, JavaScript object notation, JSON format, JSON or XML, MongoDb, news paper, NoSQL, NoSQL Database, REST service, REST service request/response, sample unstructured data, Semi-structured data. structured data, traditional database system, transmit over wireComments Off

Tag - Hadoop Distributed File System

Distributed Incubator

Fault Tolerance Enhancement On Apache Hadoop 3.0.0-alpha2 For Supporting More Than 2 NameNodes

Steering number of mapper (MapReduce) in sqoop for parallelism of data ingestion into Hadoop Distributed File System (HDFS)

Transfer structured data from Oracle to Hadoop storage system

Why Lambda Architecture in Big Data Processing

Apache Kafka, The next Generation Distributed Messaging System

Basic concept of Data Lake

Semi-Structured Data

ready to realize your digital transformation dreams?

Tag - Hadoop Distributed File System

May I Know Your Details?