Tag - HDFS

Why Lambda Architecture in Big Data Processing

Due to the exponential growth of digitization, the entire globe is creating minimum 2.5 Quintilian 2500000000000 Million) bytes of data every day and that we can denote as Big Data. Data generation is happening from everywhere starting from social media sites, various sensors, satellite, purchase transaction, Mobile, GPS signals and much more. With the advancement of technology, there is no sign of slowing down of data generation, instead it will grow in massive volume. All the major organizations, retailers,...

Read more...

Apache Kafka, The next Generation Distributed Messaging System

In Big Data project, the main challenge is to collect an enormous volume of data. We need distributed high throughput messaging systems to overcome it. Apache Kafka is designed to address the challenge. It was originally developed at LinkedIn Corporation and later on became a part of Apache project. A Messaging System is typically responsible for transferring data from one application to another. A message is nothing but the bunch of data/information. To ingest huge volume of data into Hadoop...

Read more...

Research Papers & Publications

1. Effective image analysis on twitter streaming using Hadoop Eco System on Amazon Web Service EC2 (Download) We have published a research paper on Hadoop and Ecosystem using real-time case study, in “International Journal of Advanced Research in Computer Science and Software Engineering” ISSN:2277 128X Title: Effective Image Analysis on Twitter Streaming using Hadoop Eco System on Amazon Web Service EC2 Paper ID: V5I9-0359 URL: Effective image analysis on twitter streaming using Hadoop Eco System on Amazon Web Service EC2 Abstract: Twitter is becoming the...

Read more...

Basic concept of Data Lake

The left side info graphics represents the basic concept of Data Lake where we can use the approach of ELT (Extraction, loading and then transformation) against traditional ETL (Extraction, Transformation and then loading) process. ETL process implies to traditional data warehousing system where structured data format follows (row and column). By leveraging HDFS (Hadoop Distributed File System), we can develop data lake to store any format data in order to process and analysis. Directly data can be loaded in the Lake...

Read more...

Semi-Structured Data

Semi-structured data lies between structured and unstructured data. Data that get stored in the traditional database system or excel sheet can be denoted as structured data and organized in COLUMNS and ROWS. Unstructured data can be considered as any data or piece of information which can't be stored in Databases/RDBMS etc. Email, Facebook comments, news paper etc. are the examples of unstructured data. Semi-structured data do not follow strict data model structure and neither raw data nor typed data in...

Read more...