Performance of Hadoop Map-Reduce
The performance of Hadoop Map-Reduce job can be increased amicably without investing more on the hardware cost. Simply tuning some parameters according to the cluster specifications, input data size and processing complexities.
Here are few general tips to improve Map_reduce job performance
– Always we should use compression when writing intermediate data (mapper output) to disk before shuffling
– Include combiner in the appropriate position.
– LongWritable data type is incorrect as output when range of output values are in Integer range. IntWritable is
right choice
– Proper diagnostic tool plays important role while configuring the cluster
– We should increase the block size of the input dataset to 256M or even 512M when job has more than
1TB of input. The number of tasks will be smaller due to above but high-end processing CPU is essential in each
data node if cluster size is limited.