Automatic compiler-inserted I/O prefetching for out-of-core applications

被引:0
|
作者
Mowry, TC
Demke, AK
Krieger, O
机构
关键词
D O I
10.1145/238721.238734
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Current operating systems offer poor performance when a numeric application's working set does not fit in main memory. As a result, programmers who wish to solve ''out-of-core'' problems efficiently are typically faced with the onerous task of rewriting an application to use explicit I/O operations (e.g., read/write). In this paper, we propose and evaluate a fully-automatic technique which liberates the programmer from this task, provides high performance, and requires only minimal changes to current operating systems. In our scheme, the compiler provides the crucial information on future access patterns without burdening the programmer, the operating system supports non-binding prefetch and release hints for managing I/O, and the operating system cooperates with a run-time layer to accelerate performance by adapting to dynamic behavior and minimizing prefetch overhead. This approach maintains the abstraction of unlimited virtual memory for the programmer, gives the compiler the flexibility to aggressively move prefetches back ahead of references, and gives the operating system the flexibility to arbitrate between the competing resource demands of multiple applications. We have implemented our scheme using the SUIF compiler and the Hurricane operating system. Our experimental results demonstrate that our fully-automatic scheme effectively hides the I/O latency in out-of-core versions of the entire NAS Parallel benchmark suite, thus resulting in speedups of roughly twofold for five of the eight applications, with two applications speeding up by threefold or more.
引用
收藏
页码:3 / 17
页数:15
相关论文
共 50 条
  • [41] Hiding I/O Latency with Pre-execution Prefetching for Parallel Applications
    Chen, Yong
    Byna, Surendra
    Sun, Xian-He
    Thakur, Rajeev
    Gropp, William
    INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2008, : 242 - +
  • [42] SSD Accelerated Parallel Out-of-Core Higher-Order Method of Moments and Its Large Applications
    Lin, Zhongchao
    Zuo, Sheng
    Zhao, Xunwang
    Zhang, Yu
    Wu, Weijun
    APPLIED COMPUTATIONAL ELECTROMAGNETICS SOCIETY JOURNAL, 2018, 33 (09): : 943 - 950
  • [43] Automatic Cloud I/O Configurator for I/O Intensive Parallel Applications
    Zhai, Jidong
    Liu, Mingliang
    Jin, Ye
    Ma, Xiaosong
    Chen, Wenguang
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2015, 26 (12) : 3275 - 3288
  • [44] DI-MMAP-a scalable memory-map runtime for out-of-core data-intensive applications
    Van Essen, Brian
    Hsieh, Henry
    Ames, Sasha
    Pearce, Roger
    Gokhale, Maya
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2015, 18 (01): : 15 - 28
  • [45] DI-MMAP—a scalable memory-map runtime for out-of-core data-intensive applications
    Brian Van Essen
    Henry Hsieh
    Sasha Ames
    Roger Pearce
    Maya Gokhale
    Cluster Computing, 2015, 18 : 15 - 28
  • [46] Automatic Generation of I/O Kernels for HPC Applications
    Behzad, Babak
    Hoang-Vu Dang
    Hariri, Farah
    Zhang, Weizhe
    Snir, Marc
    2014 9TH PARALLEL DATA STORAGE WORKSHOP (PDSW), 2014, : 31 - 36
  • [47] Improving I/O performance of applications through compiler-directed code restructuring
    Kandemir, Mahmut
    Son, Seung Woo
    Karakoy, Mustafa
    PROCEEDINGS OF THE 6TH USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES (FAST '08), 2008, : 159 - +
  • [48] ACIC: Automatic Cloud I/O Configurator for HPC Applications
    Liu, Mingliang
    Jin, Ye
    Zhai, Jidong
    Zhai, Yan
    Shi, Qianqian
    Ma, Xiaosong
    Chen, Wenguang
    2013 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2013,
  • [49] Automatic generation of benchmarks for I/O-intensive parallel applications
    Hao, Meng
    Zhang, Weizhe
    Zhang, You
    Snir, Marc
    Yang, Laurence T.
    JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2019, 124 : 1 - 13
  • [50] Automatic management of CPU and I/O bottlenecks in distributed applications on ATM networks
    Nurmi, MA
    Bejcek, WE
    Gregoire, R
    Liu, KC
    Pohl, MD
    PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE DISTRIBUTED COMPUTING, 1996, : 481 - 489