How Fast Can One Scale Down a Distributed File System?

被引：0

作者：

Cheriere, Nathanael ^{[1
]}

Antoniu, Gabriel ^{[2
]}

机构：

[1] ENS Rennes, IRISA, Rennes, France

[2] INRIA, Rennes Bretagne Atlantique Res Ctr, Rennes, France

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2017年

关键词：

Elastic Storage; Distributed File System; Malleable File System; Model; Decommission;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For efficient Big Data processing, efficient resource utilization becomes a major concern as large-scale computing infrastructures such as supercomputers or clouds keep growing in size. Naturally, energy and cost savings can be obtained by reducing idle resources. Malleability, which is the possibility for resource managers to dynamically increase or reduce the resources of jobs, appears as a promising means to progress towards this goal. However, state-of-the-art parallel and distributed file systems have not been designed with malleability in mind. This is mainly due to the supposedly high cost of storage decommission, which is considered to involve expensive data transfers. Nevertheless, as network and storage technologies evolve, old assumptions on potential bottlenecks can be revisited. In this study, we evaluate the viability of malleability as a design principle for a distributed file system. We specifically model the duration of the decommission operation, for which we obtain a theoretical lower bound. Then we consider HDFS as a use case and we show that our model can explain the measured decommission times. The existing decommission mechanism of HDFS is good when the network is the bottleneck, but could be accelerated by up to a factor 3 when the storage is the limiting factor. With the highlights provided by our model, we suggest improvements to speed up decommission in HDFS and we discuss open perspectives for the design of efficient malleable distributed file systems.

引用

页码：141 / 150

页数：10

共 50 条

[21] How fast can one compute the permanent of circulant matrices?
Bernasconi, A
Codenotti, B
Crespi, V
Resta, G
LINEAR ALGEBRA AND ITS APPLICATIONS, 1999, 292 (1-3) : 15 - 37
[22] HOW ACCURATELY CAN WE CALCULATE FAST NEUTRONS SLOWING DOWN IN WATER?
Sublet, J-Ch.
Cullen, D. E.
MacFarlane, R. E.
NUCLEAR TECHNOLOGY, 2009, 168 (02) : 293 - 297
[23] Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Islam, Nusrat S.
Lu, Xiaoyi
Wasi-ur-Rahman, Md
Panda, Dhabaleswar K.
2013 IEEE 21ST ANNUAL SYMPOSIUM ON HIGH-PERFORMANCE INTERCONNECTS (HOTI), 2013, : 75 - 78
[24] Optimizing file availability in a secure serverless distributed file system
Douceur, JR
Wattenhofer, RP
20TH IEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, PROCEEDINGS, 2001, : 4 - 13
[25] A Distributed File System for Frequency Reading of Various File Sizes
Ma, Pengfei
Yin, Yanshen
Lan, Chao
Zhang, Yong
Xing, Chunxiao
2013 10TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA 2013), 2013, : 339 - +
[26] JigDFS: A Secure Distributed File System
Bian, Jiang
Seker, Remzi
IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN CYBER SECURITY, 2009, : 76 - 82
[27] The Jigsaw secure distributed file system
Bian, Jiang
Seker, Remzi
COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (04) : 1142 - 1152
[28] DESIGN AND IMPLEMENTATION OF A DISTRIBUTED FILE SYSTEM
CHENG, HC
SHEU, JP
SOFTWARE-PRACTICE & EXPERIENCE, 1991, 21 (07): : 657 - 675
[29] Data Structures for Storing File Namespace in Distributed File System
Long, Luu Hoang
Choi, Eunmi
Kim, SangBum
Kim, Pilsung
NCM 2008 : 4TH INTERNATIONAL CONFERENCE ON NETWORKED COMPUTING AND ADVANCED INFORMATION MANAGEMENT, VOL 1, PROCEEDINGS, 2008, : 250 - 255
[30] The Evolution of the Hadoop Distributed File System
Maneas, Stathis
Schroeder, Bianca
2018 32ND INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION NETWORKING AND APPLICATIONS WORKSHOPS (WAINA), 2018, : 67 - 74

← 1 2 3 4 5 →