HBA: Distributed metadata management for large cluster-based storage systems

被引:49
|
作者
Zhu, Yifeng [1 ]
Jiang, Hong [2 ]
Wang, Jun [3 ]
Xian, Feng [2 ]
机构
[1] Univ Maine, Dept Elect & Comp Engn, Orono, ME 04473 USA
[2] Univ Nebraska, Dept Comp Sci & Engn, Lincoln, NE 68588 USA
[3] Univ Cent Florida, Sch Elect Engn & Comp Sci, Orlando, FL 32816 USA
基金
美国国家科学基金会; 美国国家航空航天局;
关键词
distributed file systems; file system management; metadata management; Bloom filter;
D O I
10.1109/TPDS.2007.70788
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within a group of metadata servers. This paper presents a novel technique called Hierarchical Bloom Filter Arrays (HBA) to map filenames to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, the Bloom filter arrays with different levels of accuracies, are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, whereas the other array, with higher accuracy, caches partial distribution information and exploits the temporal locality of file access patterns. Both arrays are replicated to all metadata servers to support fast local lookups. We evaluate HBA through extensive trace-driven simulations and implementation in Linux. Simulation results show our HBA design to be highly effective and efficient in improving the performance and scalability of file systems in clusters with 1,000 to 10,000 nodes (or superclusters) and with the amount of data in the petabyte scale or higher. Our implementation indicates that HBA can reduce the metadata operation time of a single-metadata-server architecture by a factor of up to 43.9 when the system is configured with 16 metadata servers.
引用
收藏
页码:750 / 763
页数:14
相关论文
共 50 条
  • [1] Hierarchical Bloom Filter Arrays (HBA): A novel, scalable metadata management system for large cluster-based storage
    Zhu, YF
    Jiang, H
    Wang, J
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING, 2004, : 165 - 174
  • [2] Efficient metadata management in large distributed storage systems
    Brandt, SA
    Miller, EL
    Long, DDE
    Xue, L
    [J]. 20TH IEEE/11TH NASA GODDARD CONFERENCE ON MASS STORAGE AND TECHNOLOGIES (MSST 2003), PROCEEDINGS, 2003, : 290 - 298
  • [3] A Novel Dynamic Metadata Management Scheme for Large Distributed Storage Systems
    Fu, Yinjin
    Xiao, Nong
    Zhou, Enqiang
    [J]. HPCC 2008: 10TH IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2008, : 987 - 992
  • [4] Synchronous Metadata Management of Large Storage Systems
    Hackl, Guenter
    Pausch, Wolfgang
    Schoenherr, Sebastian
    Specht, Guenther
    Thiel, Gunther
    [J]. PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL DATABASE ENGINEERING & APPLICATIONS SYMPOSIUM (IDEAS '10), 2010, : 1 - 6
  • [5] DEPENDABILITY EVALUATION OF CLUSTER-BASED DISTRIBUTED SYSTEMS
    Anceaume, Emmanuelle
    Brasileiro, Francisco
    Ludinard, Romaric
    Sericola, Bruno
    Tronel, Frederic
    [J]. INTERNATIONAL JOURNAL OF FOUNDATIONS OF COMPUTER SCIENCE, 2011, 22 (05) : 1123 - 1142
  • [6] Design and evaluation of large scale loosely coupled cluster-based distributed systems
    Sakamoto, Kenji
    Yoshida, Makoto
    [J]. 2007 IFIP INTERNATIONAL CONFERENCE ON NETWORK AND PARALLEL COMPUTING WORKSHOPS, PROCEEDINGS, 2007, : 572 - +
  • [7] Scalable Metadata Management Techniques for Ultra-Large Distributed Storage Systems - A Systematic Review
    Singh, Harcharan Jit
    Bawa, Seema
    [J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [8] Cluster-Based Distributed Algorithms for Very Large Linear Equations
    古志民
    MARTA Kwiatkowska
    付引霞
    [J]. Journal of Beijing Institute of Technology, 2006, (01) : 66 - 70
  • [9] Cluster-Based Distributed Consensus
    Li, Wenjun
    Dai, Huaiyu
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2009, 8 (01) : 28 - 31
  • [10] Cluster-based distributed algorithm for energy management in smart grids
    Brettschneider, Daniel
    Hoelker, Daniel
    Roer, Peter
    Toenjes, Ralf
    [J]. COMPUTER SCIENCE-RESEARCH AND DEVELOPMENT, 2016, 31 (1-2): : 17 - 23