Monthly Archives - March 2019

Error while batch processing of rest data persisted in Basic Hadoop based (HDFS) Data Lake “Permission denied: user=dr.who, access=READ_EXECUTE, inode=”/tmp”:hdadmin:supergroup:drwx……..”

Typically,  persisting unstructured data and subsequent batch processing  can be very costly and is not advisable for small organizations & startups, as cost is prime factor for them. A Hadoop based Data Lake using Map-Reduce, fits perfectly in this scenario which is not only cost effective but also scalable and easy to extend further. Though it may sound a great option to have, we might face issues while setting up the same and one of common issues is, error "Permission...

Read more...

Data Analysis Vs Predictive Analysis Vs Artificial Intelligence

What is Artificisal Intelligence? In simple words, Artificial intelligence, sometimes called machine intelligence, is intelligence demonstrated by machines, to learn from experience, adjust to new inputs and perform human-like tasks. Data Analysis - Predictive Analysis – Artificial Intelligence Data analysis refers to reviewing data from past events for patterns. Predictive analysis is making assumptions and testing based on past data to predict future what/ifs. AI machine learning analyzes data, makes assumptions, learns and provides predictions at a scale and depth of detail...

Read more...

Resolved – ” Incompatible clusterIds in… ” in Multi Node Hadoop Cluster Setup

Currently, there are many startups / small companies and their customers, working on Data Analytics, ML, AI and related solutions. Due to their budget constraints, some of them don't want to leverage Cloud-based storage.  Alternatively, to process ingested data, they create basic Data Lake using HDFS. During this process, they might encounter the exception of "org.apache.hadoop.hdfs.server.common.Storage: java.io.IOException:  Incompatible  clusterIDs in /home/....". while starting the Name Node or Master Node in a  multi-node Hadoop Cluster. This may occur in the following scenarios: ...

Read more...

Network Topology To Create Multi Node Hybrid Cluster For Hadoop Installation

The aim of this article is to provide an outline for creating network topology for Hadoop installation in multi node hybrid cluster with limited available hardware resources.  This cluster would be beneficial for learning Hadoop, with lower volume of unstructured data processing using various engines etc. Before the cluster setup: We installed Hadoop on a single node cluster running on Ubuntu 14.04 on top of Windows 10 using VMware workstation player. Later we have copied the .vmx file into multiple...

Read more...