Data Engineering

DYNAMICS OF ONLINE SOCIAL NETWORKS

Mathematical models for the analysis of spread of social influence have emerged as a major topic of interest among the researchers in diverse disciplines such as sociology, economics and computer science. Empirical studies of diffusion on social networks date back to the 1940s. Later on, theoretical propagation models were introduced in late 1970s. In the recent years, Online Social Networks such as Facebook, Twitter and Linkedin has experienced explosive growth and these have remarkably changed the way people communicate....

Read more...

Data storage mechanism in Facebook

  We are all almost familiar with social media mainly Facebook where photo uploads total 300 million per day. Daily generates 4.5 billion likes. Every 60 seconds, 510 comments are posted, 293,000 statuses are updated. It's really curious to know how Facebook stores such a huge volume of data that's totally impossible using traditional database management systems (RDBMS) . Facebook is using distributed database management system called Cassandra. Cassandra was initially developed by Facebook to enhance their inbox search feature. In...

Read more...

Mainframe Applications slowly migrating to Hadoop

The giant organizations across the globe are using legacy mainframe systems due to it's scalability, security and reliability of machine's processing capacity subjected to heavy and large workloads. Of course, these infrastructures desire huge hardware, software and processing capacity. As the technology advancing very rapidly, scarcity of mainframe technicians, developers etc are increasing and it has become a major challenge for those organizations to continue their operations. The maintenance/replacement of these hardware are also another threat due to low...

Read more...

Establishment of Data Lake specific to multi-channel e-commerce application to understand customer’s buying pattern

Post order fulfillment data is becoming a very important asset of e-commerce vendors to understand complete buying pattern of customers. Especially for the e-commerce vendors who sells multiple products starting from electronics to apparels. Extraction and transformation are time-consuming operations when partially structured data starts moving from the various sources and finally land into the relational data warehouse.  Data extracted from the social media are semi-structured (JSON or XML).  As an example, Facebook provides information in JSON format through Graph API and same...

Read more...

How to understand Data Pipeline easily

  A data pipeline can be visualized as extraction, transformation and then loading of data into storage area referred as Database system or Data warehousing system. The data enters into one end of the multi-stage process in a particular shape / form and comes out of the other side in a different (desired) shape / form. The data pipeline has many stages that depends on the entered input data. There might be less number of stages if input data is...

Read more...

The Internet Of Things (IOT)

The Internet Of Things has started vibrating now with a strong and bold promises to make a revolutionary, fully connected "Smart" world where the relationship between objects, their environment and people become more tightly twisted together. As a basic definition of IOT, we can refer to an environment where network connectivity and computing capability extends to objects , sensors and daily used articles normally bypassing computers, allowing those devices to generate, exchange and consume data with or without minimal...

Read more...

Why data becoming new fuel/oil and primary driver for all types of business strategy

From the practical reality, we can understand how companies are using data analysis to boost their revenue growth. Now a day's most of us are subscriber for Dish TV from well known DTH Service providers and we adopt our desired packages specific to our types of interest. However in the chosen package, all the channels of specific choice will not be provided. Instead , observing the pattern of the viewers frequency of watching channels, service providers gather the data...

Read more...

Analysis of CCTV footage

All of us know CCTV is commonly used for a variety of purposes like maintaining perimeter security in medium- to high-secure areas and defense installations, observing behavior of incarcerated inmates and potentially dangerous patients in medical facilities, traffic monitoring etc. The amount of digital data generated by a single camera after compression is about 20 gigabytes per day. The number of camera increases with the increase of premises and its surroundings. If we are planning to store those data...

Read more...

Ingesting Big Data into HDFS

we are always talking about Big data processing using Hadoop. And know the basic definition of Big Data which is huge volume of data those can not be stored in existing traditional database or data repository. Interestingly, how can we import such a huge volume of data to the cluster of computers where Hadoop is installed? Yes, using Flume we can continuously collect the stream of data. For example Twitter data can be collected for analysis of comments. Sqoop...

Read more...

Big Data generation statistics

Facebook is collecting our data more than 500 terabytes (1000 X 500 Gigabytes) a day which includes 2.9 billion Likes per day. More than 350 million photos uploaded per day. Data per day processed by Google 24 petabytes (24 X 1,000,000,000,000,000 bytes). 400 hours of video are uploaded to YouTube every minute. The Apache Cassandra database is accurate adoption when we need scalability and high availability without compromising performance. Facebook uses it for its Inbox search. Initially Cassandra was developed by...

Read more...