PABIRS: A Data Access Middleware for Distributed File Systems

被引:0
|
作者
Wu, Sai [1 ]
Chen, Gang [1 ]
Zhou, Xianke [2 ]
Zhang, Zhenjie [3 ]
Tung, Anthony K. H. [4 ]
Winslett, Marianne [5 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
[2] NetEase Hangzhou Network Co Ltd, Hangzhou, Zhejiang, Peoples R China
[3] Illinois Singapore Pte Ltd, Adv Digital Sci Ctr, Singapore, Singapore
[4] Natl Univ Singapore, Sch Comp, Singapore, Singapore
[5] Univ Illinois, Deparement Comp Sci, Champaign, IL USA
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Various big data management systems have emerged to handle different types of applications, which cast very different demands on storage, indexing and retrieval of large amount of data on distributed file system. Such diversity on demands has raised huge challenges to the design of new generation of data access service for big data. In this paper, we present PABIRS, a unified data access middleware to support mixed workloads. PABIRS encapsulates the underlying distributed file system (DFS) and provides a unified access interface to systems such as MapReduce and key-value stores. PABIRS achieves dramatic improvement on efficiency by employing a novel hybrid indexing scheme. Based on the data distribution, the indexing scheme adaptively builds bitmap index and Log Structured Merge Tree (LSM) index. Moreover, PABIRS distributes the computation to multiple index nodes and utilizes a Pregel-based algorithm to facilitate parallel data search and retrieval. We empirically evaluate PABIRS against other existing distributed data processing systems and verify the huge advantages of PABIRS on shorter response time, higher throughput and better scalability, over big data with real-life phone logs and TPC-H benchmark.
引用
收藏
页码:113 / 124
页数:12
相关论文
共 50 条
  • [1] The Design and Implementation of Distributed File Access Middleware
    Zhang, Buzhong
    Jin, Haidong
    [J]. DCABES 2008 PROCEEDINGS, VOLS I AND II, 2008, : 89 - +
  • [2] Opass: Analysis and Optimization of Parallel Data Access on Distributed File Systems
    Yin, Jiangling
    Wang, Jun
    Zhou, Jian
    Lukasiewicz, Tyler
    Huang, Dan
    Zhang, Junyao
    [J]. 2015 IEEE 29TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM (IPDPS), 2015, : 623 - 632
  • [3] Achieving Load Balance for Parallel Data Access on Distributed File Systems
    Huang, Dan
    Han, Dezhi
    Wang, Jun
    Yin, Jiangling
    Chen, Xunchao
    Zhang, Xuhong
    Zhou, Jian
    Ye, Mao
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (03) : 388 - 402
  • [4] Distributed Data Management and Distributed File Systems
    Girone, Maria
    [J]. 21ST INTERNATIONAL CONFERENCE ON COMPUTING IN HIGH ENERGY AND NUCLEAR PHYSICS (CHEP2015), PARTS 1-9, 2015, 664
  • [5] Decentralized access control in distributed file systems
    Miltchev, Stefan
    Smith, Jonathan M.
    Prevelakis, Vassilis
    Keromytis, Angelos
    Ioannidis, Sotiris
    [J]. ACM COMPUTING SURVEYS, 2008, 40 (03)
  • [6] Efficient access control for distributed hierarchical file systems
    Pollack, KT
    Brandt, SA
    [J]. TWENTY-SECOND IEEE/THIRTEENTH NASA GODDARD CONFERENCE ON MASS STORAGE SYSTEMS AND TECHNOLOGIES, PROCEEDINGS: INFORMATION RETRIEVAL FROM VERY LARGE STORAGE SYSTEMS, 2005, : 253 - 260
  • [7] Data Consistency Protocol for Distributed File Systems
    No, Jaechun
    [J]. 2009 IEEE INTERNATIONAL WORKSHOP ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATIONS, 2009, : 253 - 258
  • [8] Distributed File System: Efficiency Experiments for Data Access and Communication
    Upadhyaya, Bipin
    Azimov, Fahriddin
    Doan, Thanh Tran
    Choi, Eunmi
    Kim, SangBum
    Kim, Pilsung
    [J]. NCM 2008: 4TH INTERNATIONAL CONFERENCE ON NETWORKED COMPUTING AND ADVANCED INFORMATION MANAGEMENT, VOL 2, PROCEEDINGS, 2008, : 400 - +
  • [9] A distributed data management middleware for data-driven application systems
    Langella, S
    Hastings, S
    Oster, S
    Kurc, T
    Catalyurek, U
    Saltz, J
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2004, : 267 - 276
  • [10] Efficient structured data access in parallel file systems
    Ching, A
    Choudhary, A
    Liao, WK
    Ross, R
    Gropp, W
    [J]. IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, PROCEEDINGS, 2003, : 326 - 335