An Efficient Replicated System for the Metadata of HDFS

被引:0
|
作者
Wang, Zhanye [1 ]
Xu, Tao [1 ]
Wang, Dongsheng [2 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Tsinghua Univ, Res Inst Informat Technol, Beijing 100084, Peoples R China
关键词
HDFS; namenode; metadata; availability; replication; NCluster;
D O I
10.14257/ijgdc.2016.9.5.16
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Hadoop HDFS is an open source project from Apache Software Foundation for scalable, distributed computing and data storage. HDFS has become a critical component in today's cloud computing environment and a wide range of applications built on top of it. However, the initial design of HDFS has introduced a single-point-of-failure, since HDFS contains only one active namenode, if this namenode experiences software or hardware failures, the whole HDFS cluster is unusable, this is a reason why people are reluctant to deploy HDFS for an application whose requirement is high availability. In this paper, we present a solution to enable the high availability for HDFS's namenode through efficient metadata replication. Our solution has 3 major advantages than existing ones: We utilize multiple active namenodes, instead of one, to build a cluster to serve requests of metadata simultaneously; We implement a pub/sub system to handle the metadata replication process across these active namonodes efficiently; We also propose a novel replication algorithm to deal with the network delay when the namonodes are deployed in different areas. Based on the solution we build a prototype called NCluster and integrate it with HDFS. We evaluate NCluster to exhibit its feasibility and effectiveness. The experimental results show that our solution performs well with low replication cost, good throughput and scalability.
引用
收藏
页码:175 / 190
页数:16
相关论文
共 50 条
  • [31] An Algorithm for Replicated Objects with Efficient Reads
    Chandra, Tushar D.
    Hadzilacos, Vassos
    Toueg, Sam
    [J]. PROCEEDINGS OF THE 2016 ACM SYMPOSIUM ON PRINCIPLES OF DISTRIBUTED COMPUTING (PODC'16), 2016, : 325 - 334
  • [32] Efficient quorum operations in replicated databases
    Helal, A
    [J]. 1998 IEEE INTERNATIONAL PERFORMANCE, COMPUTING AND COMMUNICATIONS CONFERENCE, 1997, : 60 - 66
  • [33] EFFICIENT REPLICATED REMOTE FILE COMPARISON
    METZNER, JJ
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 1991, 40 (05) : 651 - 660
  • [34] Moving metadata from ad hoc files to database tables for robust, highly available, and scalable HDFS
    Heesun Won
    Minh Chau Nguyen
    Myeong-Seon Gil
    Yang-Sae Moon
    Kyu-Young Whang
    [J]. The Journal of Supercomputing, 2017, 73 : 2657 - 2681
  • [35] Efficient Logging of Metadata using NVRAM for NAND Flash based File System
    Lee, Chul
    Lim, Seung-Ho
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2012, : 453 - +
  • [36] Efficient Logging of Metadata Using NVRAM for NAND Flash based File System
    Lee, Chul
    Lim, Seung-Ho
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2012, 58 (01) : 86 - 94
  • [37] Efficient Metadata Management in Block-Level CDP System for Cyber Security
    Li, Hongyan
    Xiao, Fengjun
    Xiong, Naixue
    [J]. IEEE ACCESS, 2019, 7 : 151569 - 151578
  • [38] Idle Time Estimation for Bandwidth-Efficient Synchronization in Replicated Distributed File System
    Gulagiz, Fidan Kaya
    Eken, Suleyman
    Kavak, Adnan
    Sayar, Ahmet
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (02) : 177 - 185
  • [39] Moving metadata from ad hoc files to database tables for robust, highly available, and scalable HDFS
    Won, Heesun
    Minh Chau Nguyen
    Gil, Myeong-Seon
    Moon, Yang-Sae
    Whang, Kyu-Young
    [J]. JOURNAL OF SUPERCOMPUTING, 2017, 73 (06): : 2657 - 2681
  • [40] Building a Version Control System in the Hadoop HDFS
    Yeh, Tsozen
    Chien, Tingyu
    [J]. NOMS 2018 - 2018 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2018,