Overcoming data locality: An in-memory runtime file system with symmetrical data distribution

被引:6
|
作者
Uta, Alexandru [1 ]
Sandu, Andreea [1 ]
Kielmann, Thilo [1 ]
机构
[1] Vrije Univ Amsterdam, Dept Comp Sci, Amsterdam, Netherlands
关键词
Many-task computing; In-memory file system; Distributed hashing; Scalability;
D O I
10.1016/j.future.2015.01.013
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In many-task computing (MTC), applications such as scientific workflows or parameter sweeps communicate via intermediate files; application performance strongly depends on the file system in use. The state of the art uses runtime systems providing in-memory file storage that is designed for data locality: files are placed on those nodes that write or read them. With data locality, however, task distribution conflicts with data distribution, leading to application slowdown, and worse, to prohibitive storage imbalance. To overcome these limitations, we present MemFS, a fully symmetrical, in-memory runtime file system that stripes files across all compute nodes, based on a distributed hash function. Our cluster experiments with Montage and BLAST workflows, using up to 512 cores, show that MemFS has both better performance and better scalability than the state-of-the-art, locality-based file system, AMFS. Furthermore, our evaluation on a public commercial cloud validates our cluster results. On this platform MemFS shows excellent scalability up to 1024 cores and is able to saturate the 10G Ethernet bandwidth when running BLAST and Montage. (c) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:144 / 158
页数:15
相关论文
共 50 条
  • [21] In-Memory Data Parallel Processor
    Fujiki, Daichi
    Mahlke, Scott
    Das, Reetuparna
    ACM SIGPLAN NOTICES, 2018, 53 (02) : 1 - 14
  • [22] HMFS: A hybrid in-memory file system with version consistency
    Liu, Hao
    Huang, Linpeng
    Zhu, Yanmin
    Zheng, Shengan
    Shen, Yanyan
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2018, 117 : 18 - 36
  • [23] A New Design of In-Memory File System Based on File Virtual Address Framework
    Sha, Edwin H. -M.
    Chen, Xianzhang
    Zhuge, Qingfeng
    Shi, Liang
    Jiang, Weiwen
    IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (10) : 2959 - 2972
  • [24] Eager Memory Management for In-Memory Data Analytics
    Jang, Hakbeom
    Bae, Jonghyun
    Ham, Tae Jun
    Lee, Jae W.
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (03): : 632 - 636
  • [25] MemepiC: Towards a Unified In-Memory Big Data Management System
    Cai, Qingchao
    Zhang, Hao
    Guo, Wentian
    Chen, Gang
    Ooi, Beng Chin
    Tan, Kian-Lee
    Wong, Weng-Fai
    IEEE TRANSACTIONS ON BIG DATA, 2019, 5 (01) : 4 - 17
  • [26] A hybrid strategy based on data distribution and migration for optimizing memory locality
    Kadayif, I
    Kandemir, M
    Choudhary, A
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, 2005, 2481 : 111 - 125
  • [27] HydraFS: an efficient NUMA-aware in-memory file system
    Wu, Ting
    Chen, Xianzhang
    Liu, Kai
    Xiao, Chunhua
    Liu, Zhixiang
    Zhuge, Qingfeng
    Sha, Edwin H. -M.
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (02): : 705 - 724
  • [28] PopulAid: In-Memory Test Data Generation
    Teusner, Ralf
    Perscheid, Michael
    Appeltauer, Malte
    Enderlein, Jonas
    Klingbeil, Thomas
    Kusber, Michael
    BIG DATA BENCHMARKING, WBDB 2014, 2015, 8991 : 101 - 108
  • [29] In-Memory Data Processing for Sales Planning
    Hrubaru, Ionut
    INNOVATION MANAGEMENT AND EDUCATION EXCELLENCE THROUGH VISION 2020, VOLS I -XI, 2018, : 2582 - 2588
  • [30] Efficient In-memory Data Management: An Analysis
    Zhang, Hao
    Tudor, Bogdan Marius
    Chen, Gang
    Ooi, Beng Chin
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (10): : 833 - 836