QoS-Aware Data Placement for MapReduce Applications in Geo-Distributed Data Centers

被引:10
|
作者
Chen, Wuhui [1 ,2 ]
Liu, Baichuan [1 ,2 ]
Paik, Incheon [3 ]
Li, Zhenni [4 ]
Zheng, Zibin [1 ,2 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510085, Peoples R China
[2] Sun Yat Sen Univ, Natl Engn Res Ctr Digital Life, Guangzhou 510085, Peoples R China
[3] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima 9650006, Japan
[4] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Data centers; Quality of service; Data transfer; Distributed databases; Data models; Optimization; Network topology; Big-data processing; data placement; geo-distributed data centers; QoS aware; BIG DATA; CLOUD;
D O I
10.1109/TEM.2020.2971717
中图分类号
F [经济];
学科分类号
02 ;
摘要
With growing data volumes and the scaling of data center clusters, communication resources often become a bottleneck in service provisioning for many MapReduce applications (e.g., training machine learning models). Therefore, data placements that bring data blocks closer to data consumers (e.g., MapReduce applications) are seen as a promising solution. In this article, we propose an efficient data-placement technique that considers network traffic reduction as well as QoS guarantees for the data blocks to optimize the communication resources. We first formulate the joint optimization of the data-placement problem, propose a generic model for minimizing communication costs, and show that the joint data-placement problem is NP-hard. To solve this problem, we propose a heuristic algorithm considering traffic flows in the network topology of data centers by first seeking optimal QoS-aware data placement based on golden division on a Zipflike replica distribution, then transforming the joint data-placement problem into a block-dependence tree (BDT) construction problem, and finally reducing the BDT construction to a graph-partitioning problem. The experimental results demonstrate that our data-placement approach could effectively improve the performance of MapReduce jobs with lower communication costs and less job execution time for big-data processing.
引用
收藏
页码:120 / 136
页数:17
相关论文
共 50 条
  • [1] QoS-aware Task Placement in Geo-distributed Data Centers with Low OPEX using Dynamic Frequency Scaling
    Gu, Lin
    Zeng, Deze
    Guo, Song
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 80 - 84
  • [2] Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers
    Zhang, Jinghui
    Chen, Jian
    Luo, Junzhou
    Song, Aibo
    TSINGHUA SCIENCE AND TECHNOLOGY, 2016, 21 (05) : 471 - 481
  • [3] Efficient Location-Aware Data Placement for Data-Intensive Applications in Geo-distributed Scientific Data Centers
    Jinghui Zhang
    Jian Chen
    Junzhou Luo
    Aibo Song
    Tsinghua Science and Technology, 2016, 21 (05) : 471 - 481
  • [4] MapReduce Task Scheduling in Heterogeneous Geo-Distributed Data Centers
    Li, Xiaoping
    Chen, Fuchao
    Ruiz, Ruben
    Zhu, Jie
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (06) : 3317 - 3329
  • [5] Optimal Task Placement with QoS Constraints in Geo-Distributed Data Centers Using DVFS
    Gu, Lin
    Zeng, Deze
    Barnawi, Ahmed
    Guo, Song
    Stojmenovic, Ivan
    IEEE TRANSACTIONS ON COMPUTERS, 2015, 64 (07) : 2049 - 2059
  • [6] Power and Cost-aware Virtual Machine Placement in Geo-distributed Data Centers
    Rawas, Soha
    Zekri, Ahmed
    El Zaart, Ali
    CLOSER: PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND SERVICES SCIENCE, 2018, : 112 - 123
  • [7] Location-aware Associated Data Placement for Geo-distributed Data-intensive Applications
    Yu, Boyang
    Pan, Jianping
    2015 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (INFOCOM), 2015,
  • [8] Temperature Aware Workload Management in Geo-Distributed Data Centers
    Xu, Hong
    Feng, Chen
    Li, Baochun
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (06) : 1743 - 1753
  • [9] DRASH: A Data Replication-Aware Scheduler in Geo-distributed Data Centers
    Convolbo, Moise W.
    Chou, Jerry
    Lu, Shihyu
    Chung, Yeh Ching
    2016 8TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM 2016), 2016, : 302 - 309
  • [10] QoS-aware replica placement for data intensive applications
    FU Xiong
    ZHU Xin-xin
    HAN Jing-yu
    WANG Ru-chuan
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2013, 20 (03) : 43 - 47