Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach

Kumar, Sandeep ; Padakandla, Sindhu ; Chandrashekar, L. ; Parihar, Priyank ; Gopinath, K. ; Bhatnagar, Shalabh (2017) Scalable Performance Tuning of Hadoop MapReduce: A Noisy Gradient Approach In: IEEE 10th International Conference on Cloud Computing (CLOUD), 25-30 June 2017, Honololu, HI, USA.

Full text not available from this repository.

Official URL: http://doi.org/10.1109/CLOUD.2017.55

Related URL: http://dx.doi.org/10.1109/CLOUD.2017.55

Abstract

Hadoop MapReduce is a popular framework for distributed storage and processing of large datasets and is used for big data analytics. It has various configuration parameters which play an important role in deciding the performance i.e., the execution time of a given big data processing job. Default values of these parameters do not result in good performance and therefore it is important to tune them. However, there is inherent difficulty in tuning the parameters due to two important reasons - first, the parameter search space is large and second, there are cross-parameter interactions. Hence, there is a need for a dimensionality-free method which can automatically tune the configuration parameters by taking into account the cross-parameter dependencies. In this paper, we propose a novel Hadoop parameter tuning methodology, based on a noisy gradient algorithm known as the simultaneous perturbation stochastic approximation (SPSA). The SPSA algorithm tunes the selected parameters by directly observing the performance of the Hadoop MapReduce system. The approach followed is independent of parameter dimensions and requires only 2 observations per iteration while tuning. We demonstrate the effectiveness of our methodology in achieving good performance on popular Hadoop benchmarks namely Grep, Bigram, Inverted Index, Word Co-occurrence and Terasort. Our method, when tested on a 25 node Hadoop cluster shows 45-66% decrease in execution time of Hadoop jobs on an average, when compared to prior methods. Further, our experiments also indicate that the parameters tuned by our method are resilient to changes in number of cluster nodes, which makes our method suitable to optimize Hadoop when it is provided as a service on the cloud.

Item Type:Conference or Workshop Item (Paper)
Source:Copyright of this article belongs to Institute of Electrical and Electronics Engineers.
Keywords:Hadoop Parameter Tuning; Simultaneous Perturbation Stochastic Approximation; Cloud Computing.
ID Code:116643
Deposited On:12 Apr 2021 07:17
Last Modified:12 Apr 2021 07:17

Repository Staff Only: item control page