Effective image analysis on twitter streaming using Hadoop Eco System on Amazon Web Service EC2 (Download)
We have published a research paper on Hadoop and Ecosystem using real-time case study, in “International Journal of Advanced Research in Computer Science and Software Engineering” ISSN:2277 128X
Title
Effective Image Analysis on Twitter Streaming using Hadoop Eco System on Amazon Web Service EC2
Paper ID
V5I9-0359
Abstract:
Twitter is becoming the most popular online micro blogging network of real time post that enables users to send and read short 140-character messages called “tweets”. Registered users can read and post tweets, but unregistered users can only read them. Today’s Twitter is now less focused on what are you doing but has emerged as a source for discovery, with a focus on sharing relevant information and engaging in conversation. Sharing various visual information in the form of images/photos are becoming very popular where all the follower can see what images/photos have been posted/twitted instantly. In this paper I am going to explain how effectively registered users shares/uploads images among the followers. This information/statistics would be of great value for any organization/company when they launch their new product in market. If a particular image/photo sharing is high among tweeter community, organization/company can be assured that their product is penetrating more in the market. Here I have analyzed the momentum of visual information propagation . So that followers can be aware of that something new have been lunched in market and subsequently will have the curiosity to dig more on it. In this paper, the collected twitter steaming which has been (rich amount of data in semi structure format JSON for an interval of time) referred to as big data are processed efficiently to achieve mentioned output. With the available traditional software like RDBMS, MVC architecture framework like Struts etc, it’s impossible to achieve the desire goal. To leverage the cloud computing, I have used the Amazon Web Service EC2 where Hadoop cluster has been created to analyze the twitter streaming data. Other components of Hadoop eco system viz. Apache Flume, Hive have also been used . Using Flume, twitter streaming has been collected for a particular interval of time and subsequently stored in Hadoop Distribution File System (HDFS) for further analysis where traditional RDBMS are not compatible. I have used Hive for mining the stored data filtered through Map Reduce phase. Only Map has been used to parse the semi structured Streaming data (JSON).