Locality-aware allocation of multi-dimensional correlated files on the cloud platform

被引:0
|
作者
Xiaofei Zhang
Yongxin Tong
Lei Chen
Min Wang
Shicong Feng
机构
[1] HKUST,
[2] Google Research USA,undefined
[3] Miao Zhen Company,undefined
来源
关键词
Distributed data allocation; Cloud storage; Multi-dimensional correlation; Subspace locality;
D O I
暂无
中图分类号
学科分类号
摘要
The effective management of enormous data volumes on the Cloud platform has attracted devoting research efforts. In this paper, we study the problem of allocating files with multidimensional correlations on the Cloud platform, such that files can be retrieved and processed more efficiently. Currently, most prevailing Cloud file systems allocate data following the principles of fault tolerance and availability, while inter-file correlations, i.e. files correlated with each other, are often neglected. As a matter of fact, data files are commonly correlated in various ways in real practices. And correlated files are most likely to be involved in the same computation process. Therefore, it raises a new challenge of allocating files with multi-dimensional correlations with the “subspace locality” taken into consideration to improve the system throughput. We propose two allocation methods for multi-dimensional correlated files stored on the Cloud platform, such that the I/O efficiency and data access locality are improved in the MapReduce processing paradigm, without hurting the fault tolerance and availability properties of the underlying file systems. Different from the techniques proposed in [1,2], which quickly map the locations of desired data for a given query Q\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\mathcal {Q}}$$\end{document}, we focus on improving the system throughput for batch jobs over correlated data files. We clearly formulate the problem and study a series of solutions on HDFS [9]. Evaluations with real application scenarios prove the effectiveness of our proposals: significant I/O and network costs can be saved during the data retrieval and processing. Especially for batch OLAP jobs, our solution demonstrates well balanced workload among distributed computing nodes.
引用
收藏
页码:353 / 380
页数:27
相关论文
共 50 条
  • [1] Locality-aware allocation of multi-dimensional correlated files on the cloud platform
    Zhang, Xiaofei
    Tong, Yongxin
    Chen, Lei
    Wang, Min
    Feng, Shicong
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2015, 33 (03) : 353 - 380
  • [2] Locality-Aware Scheduling for Containers in Cloud Computing
    Babu, G. Charles
    Hanuman, A. Sai
    Kiran, J. Sasi
    Babu, B. Sankara
    [J]. INVENTIVE COMMUNICATION AND COMPUTATIONAL TECHNOLOGIES, ICICCT 2019, 2020, 89 : 177 - 185
  • [3] Locality-Aware Scheduling for Containers in Cloud Computing
    Zhao, Dongfang
    Mohamed, Mohamed
    Ludwig, Heiko
    [J]. IEEE TRANSACTIONS ON CLOUD COMPUTING, 2020, 8 (02) : 635 - 646
  • [4] Toward Locality-aware Scheduling for Containerized Cloud Services
    Zhao, Dongfang
    Mandagere, Nagapramod
    Alatorre, Gabriel
    Mohamed, Mohamed
    Ludwig, Heiko
    [J]. PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2015, : 263 - 270
  • [5] Locality-Aware Load Sharing in Mobile Cloud Computing
    Jonathan, Albert
    Chandra, Abhishek
    Weissman, Jon
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC' 17), 2017, : 141 - 150
  • [6] Energy-aware Multi-dimensional Resource Allocation Algorithm in Cloud Data Center
    Nie, Jiawei
    Luo, Juan
    Yin, Luxiu
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2017, 11 (09): : 4320 - 4333
  • [7] NEST: Locality-aware Approximate Query Service for Cloud Computing
    Hua, Yu
    Xiao, Bin
    Liu, Xue
    [J]. 2013 PROCEEDINGS IEEE INFOCOM, 2013, : 1303 - 1311
  • [8] Penalty- and Locality-aware Memory Allocation in Redis Using Enhanced AET
    Pan, Cheng
    Wang, Xiaolin
    Luo, Yingwei
    Wang, Zhenlin
    [J]. ACM TRANSACTIONS ON STORAGE, 2021, 17 (02)
  • [9] Locality-aware process placement for parallel and distributed simulation in cloud data centers
    Zaheer, Saad
    Malik, Asad Waqar
    Rahman, Anis Ur
    Khan, Safdar Abbas
    [J]. JOURNAL OF SUPERCOMPUTING, 2019, 75 (11): : 7723 - 7745
  • [10] Locality-aware process placement for parallel and distributed simulation in cloud data centers
    Saad Zaheer
    Asad Waqar Malik
    Anis Ur Rahman
    Safdar Abbas Khan
    [J]. The Journal of Supercomputing, 2019, 75 : 7723 - 7745