Opass: Analysis and Optimization of Parallel Data Access on Distributed File Systems

被引:12
|
作者
Yin, Jiangling [1 ]
Wang, Jun [1 ]
Zhou, Jian [1 ]
Lukasiewicz, Tyler [1 ]
Huang, Dan [1 ]
Zhang, Junyao [1 ]
机构
[1] Univ Cent Florida, Dept Elect Engn & Comp Sci, Orlando, FL 32816 USA
关键词
Parallel Data Access; Distributed File Systems; Bipartite Matching;
D O I
10.1109/IPDPS.2015.55
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we study parallel data access on distributed file systems, e.g, the Hadoop file system. Our experiments show that parallel data read requests are often served data remotely and in an imbalanced fashion. This results in a serious disk access and data transfer contention on certain cluster/storage nodes. We conduct a complete analysis on how remote and imbalanced read patterns occur and how they are affected by the size of the cluster. We then propose a novel method to Optimize Parallel Data Access on Distributed File Systems referred to as Opass. The goal of Opass is to reduce remote parallel data accesses and achieve a higher balance of data read requests between cluster nodes. To achieve this goal, we represent the data read requests that are issued by parallel applications to cluster nodes as a graph data structure where edges weights encode the demands of data locality and load capacity. Then we propose new matching-based algorithms to match processes to data based on the configurations of the graph data structure so as to compute the maximum degree of data locality and balanced access. Our proposed method can benefit parallel data-intensive analysis with various parallel data access strategies. Experiments are conducted on PRObEs Marmot 128-node cluster testbed and the results from both benchmark and well-known parallel applications show the performance benefits and scalability of Opass.
引用
收藏
页码:623 / 632
页数:10
相关论文
共 50 条
  • [1] Achieving Load Balance for Parallel Data Access on Distributed File Systems
    Huang, Dan
    Han, Dezhi
    Wang, Jun
    Yin, Jiangling
    Chen, Xunchao
    Zhang, Xuhong
    Zhou, Jian
    Ye, Mao
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (03) : 388 - 402
  • [2] Efficient structured data access in parallel file systems
    Ching, A
    Choudhary, A
    Liao, WK
    Ross, R
    Gropp, W
    [J]. IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2003, : 326 - 335
  • [3] PABIRS: A Data Access Middleware for Distributed File Systems
    Wu, Sai
    Chen, Gang
    Zhou, Xianke
    Zhang, Zhenjie
    Tung, Anthony K. H.
    Winslett, Marianne
    [J]. 2015 IEEE 31ST INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2015, : 113 - 124
  • [4] Small-File Access in Parallel File Systems
    Carns, Philip
    Lang, Sam
    Ross, Robert
    Vilayannur, Murali
    Kunkel, Julian
    Ludwig, Thomas
    [J]. 2009 IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL & DISTRIBUTED PROCESSING, VOLS 1-5, 2009, : 524 - +
  • [5] On Distributed File Tree Walk of Parallel File Systems
    LaFon, Jharrod
    Misra, Satyajayant
    Bringhurst, Jon
    [J]. 2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,
  • [6] Distributed Data Management and Distributed File Systems
    Girone, Maria
    [J]. 21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [7] Design and analysis of a parallel file system for distributed shared memory systems
    Mac, SC
    Shieh, CK
    Chang, JB
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 1999, 45 (08) : 603 - 617
  • [8] Decentralized access control in distributed file systems
    Miltchev, Stefan
    Smith, Jonathan M.
    Prevelakis, Vassilis
    Keromytis, Angelos
    Ioannidis, Sotiris
    [J]. ACM COMPUTING SURVEYS, 2008, 40 (03)
  • [9] Optimizing remote file access for parallel and distributed network applications
    Weissman, JB
    Marina, M
    Gingras, M
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2001, 61 (11) : 1591 - 1608
  • [10] Optimization of Reading Data via Classified Block Access Patterns in File Systems
    Liao, Jianwei
    Chen, Shanxiong
    [J]. IEEE ACCESS, 2016, 4 : 9421 - 9427