Automating and Optimizing Data Transfers for Many-core Coprocessors

被引:0
|
作者
Ren, Bin [1 ]
Ravi, Nishkam [2 ]
Yang, Yi [2 ]
Feng, Min [2 ]
Agrawal, Gagan [1 ]
Chakradhar, Srimat [2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] NEC Labs Amer, Princeton, NJ USA
关键词
Coprocessors; Static Analysis; Runtime Analysis; Offloading;
D O I
10.1145/2597652.2600114
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Orchestrating data transfers between CPUs and a coprocessor manually is cumbersome, particularly for multi-dimensional arrays and other data structures with multi-level pointers, which are common in scientific computations. This work describes a system that includes both compile-time and runtime solutions for this problem, with the overarching goal of improving programmer productivity while maintaining performance. We implemented our best compile-time solution, partial linearization with pointer reset, as a source-to-source transformation, and evaluated our work by multiple C benchmarks. Our experiment results demonstrate that our best compile-time solution can perform 2.5x-5x faster than original runtime solution, and the CPU-Coprocessor code with it can achieve 1.5x-2.5x speedup over the 16-thread CPU version.
引用
收藏
页码:177 / 177
页数:1
相关论文
共 50 条
  • [21] Sesame: A User-Transparent Optimizing Framework for Many-Core Processors
    Fang, Jianbin
    Varbanescu, Ana Lucia
    Sips, Henk
    PROCEEDINGS OF THE 2013 13TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID 2013), 2013, : 70 - 73
  • [22] A Many-core Architecture for In-Memory Data Processing
    Agrawal, Sandeep R.
    Idicula, Sam
    Raghavan, Arun
    Vlachos, Evangelos
    Govindaraju, Venkatraman
    Varadarajan, Venkatanathan
    Balkesen, Cagri
    Giannikis, Georgios
    Roth, Charlie
    Agarwal, Nipun
    Sedlar, Eric
    50TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2017, : 245 - 258
  • [23] Optimizing Cache Locality for Irregular Data Accesses on Many-Core Intel Xeon Phi Accelerator Chip
    Nhat-Phuong Tran
    Choi, Dong Hoon
    Lee, Myungho
    2014 IEEE INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, 2014 IEEE 6TH INTL SYMP ON CYBERSPACE SAFETY AND SECURITY, 2014 IEEE 11TH INTL CONF ON EMBEDDED SOFTWARE AND SYST (HPCC,CSS,ICESS), 2014, : 153 - 156
  • [24] Value and Energy Optimizing Dynamic Resource Allocation in Many-core HPC Systems
    Singh, Amit Kumar
    Dziurzanski, Piotr
    Indrusiak, Leandro Soares
    2015 IEEE 7TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGY AND SCIENCE (CLOUDCOM), 2015, : 180 - 185
  • [25] Optimizing Memory Bandwidth in OpenVX Graph Execution on Embedded Many-Core Accelerators
    Tagliavini, Giuseppe
    Haugou, Germain
    Benini, Luca
    PROCEEDINGS OF THE 2014 CONFERENCE ON DESIGN AND ARCHITECTURES FOR SIGNAL AND IMAGE PROCESSING, 2014,
  • [26] An Adaptive Non-Uniform Loop Tiling for DMA-based Bulk Data Transfers on Many-Core Processor
    Qiu, Keni
    Ni, Yuanhui
    Zhang, Weigong
    Wang, Jing
    Wu, Xiaoqiang
    Xue, Chun Jason
    Li, Tao
    PROCEEDINGS OF THE 34TH IEEE INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD), 2016, : 9 - 16
  • [27] Optimizing massively parallel sparse matrix computing on ARM many-core processor
    Zheng, Jiang
    Jiang, Jiazhi
    Du, Jiangsu
    Huang, Dan
    Lu, Yutong
    PARALLEL COMPUTING, 2023, 117
  • [28] Efficient Distributed Data Structures for Future Many-core Architectures
    Fatourou, Panagiota
    Kallimanis, Nikolaos D.
    Kanellou, Eleni
    Makridakis, Odysseas
    Symeonidou, Christi
    2016 IEEE 22ND INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2016, : 835 - 842
  • [29] Full-Stack Optimizing Transformer Inference on ARM Many-Core CPU
    Jiang, Jiazhi
    Du, Jiangsu
    Huang, Dan
    Chen, Zhiguang
    Lu, Yutong
    Liao, Xiangke
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (07) : 2221 - 2235
  • [30] Optimizing memory bandwidth exploitation for OpenVX applications on embedded many-core accelerators
    Tagliavini, Giuseppe
    Haugou, Germain
    Marongiu, Andrea
    Benini, Luca
    JOURNAL OF REAL-TIME IMAGE PROCESSING, 2018, 15 (01) : 73 - 92