QoS-Aware Data Placement for MapReduce Applications in Geo-Distributed Data Centers

被引:10
|
作者
Chen, Wuhui [1 ,2 ]
Liu, Baichuan [1 ,2 ]
Paik, Incheon [3 ]
Li, Zhenni [4 ]
Zheng, Zibin [1 ,2 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510085, Peoples R China
[2] Sun Yat Sen Univ, Natl Engn Res Ctr Digital Life, Guangzhou 510085, Peoples R China
[3] Univ Aizu, Sch Comp Sci & Engn, Aizu Wakamatsu, Fukushima 9650006, Japan
[4] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Data centers; Quality of service; Data transfer; Distributed databases; Data models; Optimization; Network topology; Big-data processing; data placement; geo-distributed data centers; QoS aware; BIG DATA; CLOUD;
D O I
10.1109/TEM.2020.2971717
中图分类号
F [经济];
学科分类号
02 ;
摘要
With growing data volumes and the scaling of data center clusters, communication resources often become a bottleneck in service provisioning for many MapReduce applications (e.g., training machine learning models). Therefore, data placements that bring data blocks closer to data consumers (e.g., MapReduce applications) are seen as a promising solution. In this article, we propose an efficient data-placement technique that considers network traffic reduction as well as QoS guarantees for the data blocks to optimize the communication resources. We first formulate the joint optimization of the data-placement problem, propose a generic model for minimizing communication costs, and show that the joint data-placement problem is NP-hard. To solve this problem, we propose a heuristic algorithm considering traffic flows in the network topology of data centers by first seeking optimal QoS-aware data placement based on golden division on a Zipflike replica distribution, then transforming the joint data-placement problem into a block-dependence tree (BDT) construction problem, and finally reducing the BDT construction to a graph-partitioning problem. The experimental results demonstrate that our data-placement approach could effectively improve the performance of MapReduce jobs with lower communication costs and less job execution time for big-data processing.
引用
收藏
页码:120 / 136
页数:17
相关论文
共 50 条
  • [21] Cost-Aware Streaming Workflow Allocation on Geo-Distributed Data Centers
    Chen, Wuhui
    Paik, Incheon
    Li, Zhenni
    IEEE TRANSACTIONS ON COMPUTERS, 2017, 66 (02) : 256 - 271
  • [22] Joint Data Purchasing and Data Placement in a Geo-Distributed Data Market
    Ren, Xiaoqi
    London, Palma
    Ziani, Juba
    Wierman, Adam
    SIGMETRICS/PERFORMANCE 2016: PROCEEDINGS OF THE SIGMETRICS/PERFORMANCE JOINT INTERNATIONAL CONFERENCE ON MEASUREMENT AND MODELING OF COMPUTER SCIENCE, 2016, : 383 - 384
  • [23] QoS-aware replica placement in data grids
    Fu, Xiong
    Wang, Yi-Bo
    Zhu, Xin-Xin
    Han, Jin-Yu
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2014, 36 (04): : 784 - 788
  • [24] A MapReduce Cluster Deployment Optimization Framework with Geo-distributed Data
    Li, Shanshan
    Lu, Qinghua
    Zhang, Weishan
    Zhu, Liming
    IEEE 12TH INT CONF UBIQUITOUS INTELLIGENCE & COMP/IEEE 12TH INT CONF ADV & TRUSTED COMP/IEEE 15TH INT CONF SCALABLE COMP & COMMUN/IEEE INT CONF CLOUD & BIG DATA COMP/IEEE INT CONF INTERNET PEOPLE AND ASSOCIATED SYMPOSIA/WORKSHOPS, 2015, : 943 - 949
  • [25] Renewable Energy-Aware Big Data Analytics in Geo-Distributed Data Centers with Reinforcement Learning
    Xu, Chenhan
    Wang, Kun
    Li, Peng
    Xia, Rui
    Guo, Song
    Guo, Minyi
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2020, 7 (01): : 205 - 215
  • [26] Green Computing with Geo-Distributed Heterogeneous Data Centers
    Pasricha, Sudeep
    Hogade, Ninad
    Siegel, Howard Jay
    Maciejewski, Anthony A.
    2019 TENTH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE (IGSC), 2019,
  • [27] Yugong: Geo-Distributed Data and Job Placement at Scale
    Huang, Yuzhen
    Shi, Yingjie
    Zhong, Zheng
    Feng, Yihui
    Cheng, James
    Li, Jiwei
    Fang, Haochuan
    Li, Chao
    Guan, Tao
    Zhou, Jingren
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2019, 12 (12): : 2155 - 2169
  • [28] GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers
    Convolbo, Moise W.
    Chou, Jerry
    Hsu, Ching-Hsien
    Chung, Yeh Ching
    COMPUTING, 2018, 100 (01) : 21 - 46
  • [29] GEODIS: towards the optimization of data locality-aware job scheduling in geo-distributed data centers
    Moïse W. Convolbo
    Jerry Chou
    Ching-Hsien Hsu
    Yeh Ching Chung
    Computing, 2018, 100 : 21 - 46
  • [30] An Optimal Task Placement Strategy in Geo-Distributed Data Centers Involving Renewable Energy
    Wang, Ran
    Lu, Yiwen
    Zhu, Kun
    Hao, Jie
    Wang, Ping
    Cao, Yue
    IEEE ACCESS, 2018, 6 : 61948 - 61958