Tag - Hadoop

Few intrinsic of Apache Zookeeper and their importance

As a bird’s eye view, Apache Zookeeper has been leveraged to get coordination services for managing distributed applications. Holds responsibility for providing configuration information, naming, synchronization, and group services over large clusters in distributed systems. To consider as an example, Apache Kafka uses Zookeeper for choosing their leader node for the topic partitions. Please click here if you want read on how to setup the multi-node Apache Zookeeper cluster on Ubuntu/Linux zNodes The key concept of the Zookeeper is the znode which can be acted...

Read more...

Real-time Distributed Data-streaming with Kafka

Real time distributed data streaming Originally written in Scala and Java, Apache Kafka is a fast, horizontally scaling, fault-tolerant messaging platform for distributed data streaming first started at LinkedIn. It provides a publisher-subscriber mechanism for processing and storing data streams in a fault-tolerant way. It is used for building real-time data pipelines by streaming social data, Geo-spatial data or sensor data from various devices. Kafka acts like a plugin for Spark, Hadoop, Storm, HBase, Flink and many others for big data analytics. Using...

Read more...

Network Topology To Create Multi Node Hybrid Cluster For Hadoop Installation

The aim of this article is to provide an outline for creating network topology for Hadoop installation in multi node hybrid cluster with limited available hardware resources.  This cluster would be beneficial for learning Hadoop, with lower volume of unstructured data processing using various engines etc. Before the cluster setup: We installed Hadoop on a single node cluster running on Ubuntu 14.04 on top of Windows 10 using VMware workstation player. Later we have copied the .vmx file into multiple...

Read more...

Apache Flink – A 4G Data Processing Engine

Analyzing streaming data in large-scale systems is becoming a focal point day by day to take accurate business decisions due to mushrooming of digital data generation sources around the globe including social media. Real-Time analytics are becoming more attractive due to possibilities of getting insights from the time-value of data (in other words, when data is in motion). Apache Flink, an open source highly innovative stream processor engine has been grounded which helps to take advantage of stream-based approaches. Besides...

Read more...

Transfer structured data from Oracle to Hadoop storage system

Using Apache's sqoop, we can transfer structured data from Relational Database Management System to Hadoop distributed file system (HDFS). Because of distributed storage mechanism in Hadoop Distributed File System (HDFS), we can store any format of data in huge volume in terms of capacity. In RDBMS, data persists in the row and column format (Known as Structured Data). In order to process the huge volume of enterprise data, we can leverage HDFS as a basic data lake. In this...

Read more...

Data Ingestion phase for migrating enterprise data into Hadoop Data Lake

The Big Data solutions helps to achieve valuable information to iron out the accurate strategic business decision. Exponential growth of digitalization, social media, telecommunication etc. are fueling enormous data generation everywhere. Prior to process of huge volume of data, we should have efficient data storage mechanism in a distributed manner to hold any form of data starting from structured to unstructured. Hadoop distributed file systems (HDFS) can be leveraged efficiently as data lake by installing on multi node cluster....

Read more...

Technical Leadership Training

Topics Training Details Duration  Computer Basics Basic of Computers and Programming C C Basic, C Advanced C++ C++ Basic, C++ Advanced, Microsoft Technologies - ASP .Net, SharePoint, - Windows Device Driver Development Java/J2ee Java, JSP, Servlets, Struts, Spring, Hibernate Oracle Stacks Oracle ADF, WCS TIBCO Tibco BW, Active Matrix Agile Scrum, Agile Web Web Development, HTML, Ajax, AngularJS, JQuery LAMP & CMS PHP, MySQL, Joomla, Wordpress, Drupal Mobility  HTML5, Android, iOS E-Commerce  Oracle-ATG Commerce, DemandWare Cloud Commerce, Magento Commerce Big Data & Hadoop  - Introduction to Big Data and Data Analytics - Overview of Hadoop - In depth knowledge in HDFS (Hadoop Distribution File System) - Map Reduce - Customization of Hadoop framework -...

Read more...
Technical Capabilities

Future of Big Data Analysis Using Hadoop

Even though medical sciences are capable of diagnose the diseases like Cancer, Alzheimer’s etc, these diseases remain still incurable. Because to find the root cause of these diseases, the medical researchers need to analyze patient's medical records, various supportive information, climatic conditions in which they lived in, across different geographical locations. And these a need a platform where a huge volume of data can be stored and analyzed. Hadoop is a powerful platform that allows us to store huge...

Read more...

Big Data Explosion

After Kerala's Puttingal Devi Temple fire tragedy, we can visualize sudden data explosion in all digital media. After that tragic incident, huge amount of data are generated in the form of text, voice, photo, video, blogs etc. in internet via social media, news channels, e-news papers and comments, sentiments, various opinions are flooded on whether fire crackers burst should be allowed in devotional places or not. This is a classic example of Big Data where existing traditional software are...

Read more...