Department of Computer Science Faculty Scholarship and Creative Works

EA2S2An Efficient Application-Aware Storage System for Big Data Processing in Heterogeneous Clusters

Teng Wang, University of Massachusetts BostonFollow
Jiayin Wang, Montclair State UniversityFollow
Son Nam Nguyen, University of Massachusetts Boston
Zhengyu Yang, Northeastern UniversityFollow
Ningfang Mi, Northeastern UniversityFollow
Bo Sheng, University of Massachusetts BostonFollow

Document Type

Conference Proceeding

Publication Date

9-14-2017

Abstract

Big data processing frameworks such as Hadoop have been widely adopted to process a large volume of data. A lot of prior work has focused on the allocation of resources and the execution order of jobs/tasks to improve the performance in a homogeneous cluster. In this paper, we investigate storage layer design in a heterogeneous system considering a new type of bundled jobs where the input data and associated application jobs are submitted in a bundle. Our goal is to break the barrier between resource management and the underlying storage layer, and improve data locality, an important performance factor for resource management, from the aspect of storage system. We develop a sampling-based randomized algorithm for the network file system to determine the placement of input data blocks. The main idea is to query a selected set of candidate nodes, and estimate their workload at run time combining centralized and per-node information. The node with the smallest workload is selected to host the data block. Our evaluation is based with system implementation and comprehensive experiments on NSF CloudLab platforms. We have also conducted simulation for large-scale clusters. The results show significant performance improvements in terms of execution time and data locality.

DOI

10.1109/ICCCN.2017.8038371

Montclair State University Digital Commons Citation

Wang, Teng; Wang, Jiayin; Nguyen, Son Nam; Yang, Zhengyu; Mi, Ningfang; and Sheng, Bo, "EA2S2An Efficient Application-Aware Storage System for Big Data Processing in Heterogeneous Clusters" (2017). Department of Computer Science Faculty Scholarship and Creative Works. 239.
https://digitalcommons.montclair.edu/compusci-facpubs/239

This document is currently not available here.

COinS

Department of Computer Science Faculty Scholarship and Creative Works

EA2S2An Efficient Application-Aware Storage System for Big Data Processing in Heterogeneous Clusters

Document Type

Publication Date

Abstract

DOI

Montclair State University Digital Commons Citation

Search

Browse

Author Corner

Links

Department of Computer Science Faculty Scholarship and Creative Works

EA2S2An Efficient Application-Aware Storage System for Big Data Processing in Heterogeneous Clusters

Authors

Document Type

Publication Date

Abstract

DOI

Montclair State University Digital Commons Citation

Share

Search

Browse

Author Corner

Links

//<![CDATA[ document.write("<a href='mailto:" + "digitalcommons" + "@" + "mail.montclair.edu" + "'>" + "Contact Us" + "<\/a>") //]]>