Overcoming data locality: An in-memory runtime file system with symmetrical data distribution

被引:6
|
作者
Uta, Alexandru [1 ]
Sandu, Andreea [1 ]
Kielmann, Thilo [1 ]
机构
[1] Vrije Univ Amsterdam, Dept Comp Sci, Amsterdam, Netherlands
关键词
Many-task computing; In-memory file system; Distributed hashing; Scalability;
D O I
10.1016/j.future.2015.01.013
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communicate via intermediate files; application performance strongly depends on the file system in use. The state of the art uses runtime systems providing in-memory file storage that is designed for data locality: files are placed on those nodes that write or read them. With data locality, however, task distribution conflicts with data distribution, leading to application slowdown, and worse, to prohibitive storage imbalance. To overcome these limitations, we present MemFS, a fully symmetrical, in-memory runtime file system that stripes files across all compute nodes, based on a distributed hash function. Our cluster experiments with Montage and BLAST workflows, using up to 512 cores, show that MemFS has both better performance and better scalability than the state-of-the-art, locality-based file system, AMFS. Furthermore, our evaluation on a public commercial cloud validates our cluster results. On this platform MemFS shows excellent scalability up to 1024 cores and is able to saturate the 10G Ethernet bandwidth when running BLAST and Montage. (c) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:144 / 158
页数:15
相关论文
共 50 条
  • [1] POSTER: MemFS: an In-Memory Runtime File System with Symmetrical Data Distribution
    Uta, Alexandra
    Sandu, Andreea
    Kielmann, Thilo
    2014 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2014, : 272 - 273
  • [2] MemEFS: an Elastic In-Memory Runtime File System for eScience Applications
    Uta, Alexandru
    Sandu, Andreea
    Costache, Stefania
    Kielmann, Thilo
    2015 IEEE 11TH INTERNATIONAL CONFERENCE ON E-SCIENCE, 2015, : 465 - 474
  • [3] MemEFS: A network-aware elastic in-memory runtime distributed file system
    Uta, Alexandru
    Danner, Ove
    van der Weegen, Cas
    Oprescu, Ana-Maria
    Sandu, Andreea
    Costache, Stefania
    Kielmann, Thilo
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 82 : 631 - 646
  • [4] In-Memory Runtime File Systems for Many-Task Computing
    Uta, Alexandru
    Sandu, Andreea
    Morozan, Ion
    Kielmann, Thilo
    ADAPTIVE RESOURCE MANAGEMENT AND SCHEDULING FOR CLOUD COMPUTING (ARMS-CC 2014), 2014, 8907 : 3 - 5
  • [5] An enhancement of data locality in Hadoop distributed file system
    Reddy, A. Siva Krishna
    Sujatha, Pothula
    Koti, Prasad
    Dhavachelvan, P.
    Amudhavel, J.
    BIOSCIENCE BIOTECHNOLOGY RESEARCH COMMUNICATIONS, 2018, 11 (01): : 123 - 133
  • [6] PHOENIX - A SAFE IN-MEMORY FILE SYSTEM
    GAIT, J
    COMMUNICATIONS OF THE ACM, 1990, 33 (01) : 81 - 86
  • [7] The Design and Implementation of an Efficient Data Consistency Mechanism for In-Memory File Systems
    Chen, Xianzhang
    Sha, Edwin H. -M.
    Sun, Zhilong
    Zhuge, Qingfeng
    Jiang, Weiwen
    2016 13TH INTERNATIONAL CONFERENCE ON EMBEDDED SOFTWARE AND SYSTEMS (ICESS) - PROCEEDINGS, 2016, : 170 - 175
  • [8] Distributed File System to Leverage Data Locality for Large-File Processing
    da Silva, Erico Correia
    Sato, Liria Matsumoto
    Midorikawa, Edson Toshimi
    ELECTRONICS, 2024, 13 (01)
  • [9] Taming data locality for task scheduling under memory constraint in runtime systems
    Gonthier, Maxime
    Marchal, Loris
    Thibault, Samuel
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 143 : 305 - 321
  • [10] Memory-based Data Storing Technologies on Hadoop Distribution File System
    Song, Aibo
    Zhao, Jinghua
    Tu, Jinlin
    Qian, Xuejiao
    2015 THIRD INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA, 2015, : 64 - 68