Scaling HDFS with a Strongly Consistent Relational Model for Metadata

被引:0
|
作者
Hakimzadeh, Kamal [1 ]
Sajjad, Hooman Peiro [1 ]
Dowling, Jim [1 ]
机构
[1] KTH Royal Inst Technol, SICS, Stockholm, Sweden
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The Hadoop Distributed File System (HDFS) scales to store tens of petabytes of data despite the fact that the entire file system's metadata must fit on the heap of a single Java virtual machine. The size of HDFS' metadata is limited to under 100 GB in production, as garbage collection events in bigger clusters result in heartbeats timing out to the metadata server (NameNode). In this paper, we address the problem of how to migrate the HDFS' metadata to a relational model, so that we can support larger amounts of storage on a shared-nothing, in-memory, distributed database. Our main contribution is that we show how to provide at least as strong consistency semantics as HDFS while adding support for a multiple-writer, multiple-reader concurrency model. We guarantee freedom from deadlocks by logically organizing inodes (and their constituent blocks and replicas) into a hierarchy and having all metadata operations agree on a global order for acquiring both explicit locks and implicit locks on subtrees in the hierarchy. We use transactions with pessimistic concurrency control to ensure the safety and progress of metadata operations. Finally, we show how to improve performance of our solution by introducing a snapshotting mechanism at NameNodes that minimizes the number of roundtrips to the database.
引用
收藏
页码:38 / 51
页数:14
相关论文
共 50 条
  • [1] RDF model and relational metadata
    Imai, A
    Yukita, S
    [J]. AINA 2003: 17TH INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS, 2003, : 534 - 537
  • [2] A Virtual Shared Metadata Storage for HDFS
    Zhou, Jiang
    Chen, Yong
    Gu, Xiaoyan
    Wang, Weiping
    Meng, Dan
    [J]. PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON NETWORKING, ARCHITECTURE AND STORAGE (NAS), 2015, : 265 - 274
  • [3] HDFS distributed metadata management research
    Xiong, An-ping
    Ma, Jin-yong
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON APPLIED SCIENCE AND ENGINEERING INNOVATION, 2015, 12 : 956 - 961
  • [4] A Metadata Management Mechanism Based on HDFS
    Chen, Xiaofeng
    Lou, Yuansheng
    Hu, Dongmei
    [J]. Applied Decisions in Area of Mechanical Engineering and Industrial Manufacturing, 2014, 577 : 1026 - 1029
  • [5] Classification based Metadata Management for HDFS
    Chandrasekar, Ashok
    Chandrasekar, Karthik
    Ramasatagopan, Harini
    Rafica, A. R.
    Balasubramaniyan, Jagadeesh
    [J]. 2012 IEEE 14TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2012 IEEE 9TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (HPCC-ICESS), 2012, : 1021 - 1026
  • [6] An Efficient Replicated System for the Metadata of HDFS
    Wang, Zhanye
    Xu, Tao
    Wang, Dongsheng
    [J]. INTERNATIONAL JOURNAL OF GRID AND DISTRIBUTED COMPUTING, 2016, 9 (05): : 175 - 190
  • [7] PARTITIONER: A Distributed HDFS Metadata Server Cluster
    Xue, Ruini
    Ao, Lixiang
    Gao, Shengli
    Guan, Zhongyang
    Lian, Lupeng
    [J]. 2014 INTERNATIONAL CONFERENCE ON CYBER-ENABLED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC), 2014, : 167 - 174
  • [8] Improving Metadata Management for Small Files in HDFS
    Mackey, Grant
    Sehrish, Saba
    Wang, Jun
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING AND WORKSHOPS, 2009, : 621 - 624
  • [9] Hopsworks: Improving User Experience and Development on Hadoop with Scalable, Strongly Consistent Metadata
    Ismail, Mahmoud
    Gebremeskel, Ermias
    Kakantousis, Theofilos
    Berthou, Gautier
    Dowling, Jim
    [J]. 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 2525 - 2528
  • [10] Strongly consistent model selection for densities
    Gérard Biau
    Benoît Cadre
    Luc Devroye
    László Györfi
    [J]. TEST, 2008, 17 : 531 - 545