Data Engineering

Steering number of mapper (MapReduce) in sqoop for parallelism of data ingestion into Hadoop Distributed File System (HDFS)

To import data from most the data source like RDBMS, sqoop internally use mapper. Before delegating the responsibility to the mapper, sqoop performs few initial operations in a sequence once we execute the command on a terminal in any node in the Hadoop cluster. Ideally, in production environment, sqoop installed in the separate node and updated .bashrc file to append sqoop's binary and configuration which helps to execute sqoop command from anywhere in the multi-node cluster. Most of the...

Read more...

Transfer structured data from Oracle to Hadoop storage system

Using Apache's sqoop, we can transfer structured data from Relational Database Management System to Hadoop distributed file system (HDFS). Because of distributed storage mechanism in Hadoop Distributed File System (HDFS), we can store any format of data in huge volume in terms of capacity. In RDBMS, data persists in the row and column format (Known as Structured Data). In order to process the huge volume of enterprise data, we can leverage HDFS as a basic data lake. In this...

Read more...

Data Ingestion phase for migrating enterprise data into Hadoop Data Lake

The Big Data solutions helps to achieve valuable information to iron out the accurate strategic business decision. Exponential growth of digitalization, social media, telecommunication etc. are fueling enormous data generation everywhere. Prior to process of huge volume of data, we should have efficient data storage mechanism in a distributed manner to hold any form of data starting from structured to unstructured. Hadoop distributed file systems (HDFS) can be leveraged efficiently as data lake by installing on multi node cluster....

Read more...

Big Data Analytics in Banking Systems

Typically Banking systems are responsible to validate and verify financial transaction data, geo-location data from mobile devices, merchant data, and authorization including submission data. Data from lots of social media channels and Banking’s mainframe data center have a significant challenge to process and deliver final output. The Issue: Legacy systems are incapable of processing the data in when is in motion. Combining all different format of data is together is another challenge like structured, semi- structured and un-structured. Big data Approach:- Big data analytics...

Read more...

Why Lambda Architecture in Big Data Processing

Due to the exponential growth of digitization, the entire globe is creating minimum 2.5 Quintilian 2500000000000 Million) bytes of data every day and that we can denote as Big Data. Data generation is happening from everywhere starting from social media sites, various sensors, satellite, purchase transaction, Mobile, GPS signals and much more. With the advancement of technology, there is no sign of slowing down of data generation, instead it will grow in massive volume. All the major organizations, retailers,...

Read more...

Apache Kafka, The next Generation Distributed Messaging System

In Big Data project, the main challenge is to collect an enormous volume of data. We need distributed high throughput messaging systems to overcome it. Apache Kafka is designed to address the challenge. It was originally developed at LinkedIn Corporation and later on became a part of Apache project. A Messaging System is typically responsible for transferring data from one application to another. A message is nothing but the bunch of data/information. To ingest huge volume of data into Hadoop...

Read more...

Fog Computing

Fog computing also refer to Edge computing . Cisco Systems introduced the term "Fog Computing" and it's not the replacement of cloud computing. Ideally cloud computing points to storing and accessing data and programs over the Internet instead of local computer's hard drive or storage. The cloud is simply a metaphor for the Internet. In Fog computing, data, processing and applications are concentrated in devices at the network edge. Here devices communicate peer-to-peer so that data storage and share...

Read more...

Mobile Phone Authentication and Fraud

Day by day, we are getting addicted to the mobile phone especially smart phone since smart phone performs many of the functions of a computer, typically having a touch screen interface, Internet access. With the extensive growth of mobile applications, we can utilize various mobile applications starting from games to financial transaction including stock market brokerage. Many banks have launched their own mobile applications so that customer can download and start financial transactions like balance amount verification, money transfer...

Read more...

Basic concept of Data Lake

The left side info graphics represents the basic concept of Data Lake where we can use the approach of ELT (Extraction, loading and then transformation) against traditional ETL (Extraction, Transformation and then loading) process. ETL process implies to traditional data warehousing system where structured data format follows (row and column). By leveraging HDFS (Hadoop Distributed File System), we can develop data lake to store any format data in order to process and analysis. Directly data can be loaded in the Lake...

Read more...

Real time data analytics helps mobile service providers to achieve aggressive advantages

Usage of smart phones has become an integral part of our daily routine. Keeping aside phone calls and SMS, we are always engaged with lots of other activities Starting from entertainment to domestic shopping, social engagement etc., by installing various types of mobile applications. Of course, mobile internet is mandatory to carry out above.  Mobile service providers are facing new and difficult challenges. Due to exponential growth of customer's expectations, they need to serve accordingly with advanced mobile technology and handle...

Read more...