Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters
Document Type
Article
Publication Date
4-1-2017
Abstract
The MapReduce framework and its open source implementation Hadoop have become the defacto platform for scalable analysis on large data sets in recent years. One of the primary concerns in Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose simple yet effective schemes which use slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our schemes dynamically allocates resources (or slots) to map and reduce tasks. We implemented the presented schemes in Hadoop V0.20.2 and evaluated them with representative MapReduce benchmarks at Amazon EC2. The experimental results demonstrate the effectiveness and robustness of our schemes under both simple workloads and more complex mixed workloads.
DOI
10.1109/TCC.2015.2415802
Montclair State University Digital Commons Citation
Yao, Yi; Wang, Jiayin; Sheng, Bo; Tan, Chiu C.; and Mi, Ningfang, "Self-Adjusting Slot Configurations for Homogeneous and Heterogeneous Hadoop Clusters" (2017). Department of Computer Science Faculty Scholarship and Creative Works. 536.
https://digitalcommons.montclair.edu/compusci-facpubs/536