Automating and Optimizing Data Transfers for Many-core Coprocessors

被引:0
|
作者
Ren, Bin [1 ]
Ravi, Nishkam [2 ]
Yang, Yi [2 ]
Feng, Min [2 ]
Agrawal, Gagan [1 ]
Chakradhar, Srimat [2 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
[2] NEC Labs Amer, Princeton, NJ USA
关键词
Coprocessors; Static Analysis; Runtime Analysis; Offloading;
D O I
10.1145/2597652.2600114
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Orchestrating data transfers between CPUs and a coprocessor manually is cumbersome, particularly for multi-dimensional arrays and other data structures with multi-level pointers, which are common in scientific computations. This work describes a system that includes both compile-time and runtime solutions for this problem, with the overarching goal of improving programmer productivity while maintaining performance. We implemented our best compile-time solution, partial linearization with pointer reset, as a source-to-source transformation, and evaluated our work by multiple C benchmarks. Our experiment results demonstrate that our best compile-time solution can perform 2.5x-5x faster than original runtime solution, and the CPU-Coprocessor code with it can achieve 1.5x-2.5x speedup over the 16-thread CPU version.
引用
收藏
页码:177 / 177
页数:1
相关论文
共 50 条
  • [1] A Skew-Insensitive Hashing Sync and Construction Scheme for Many-Core Coprocessors
    Zhou, Kailai
    Chen, Hong
    Sun, Hui
    Li, Cuiping
    2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1334 - 1341
  • [2] Optimizing the gravitational tree algorithm for many-core processors
    Tokuue, Tomoyuki
    Ishiyama, Tomoaki
    MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2024, 528 (01) : 821 - 832
  • [3] Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures
    Zhang, Peng
    Fang, Jianbin
    Yang, Canqun
    Huang, Chun
    Tang, Tao
    Wang, Zheng
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1878 - 1896
  • [4] Characterizing and Optimizing Transformer Inference on ARM Many-core Processor
    Jiang, Jiazhi
    Du, Jiangsu
    Huang, Dan
    Li, Dongsheng
    Zheng, Jiang
    Lu, Yutong
    51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
  • [5] Parallel simulation of many-core processor and many-core clusters
    Lü, Huiwei
    Cheng, Yuan
    Bai, Lu
    Chen, Mingyu
    Fan, Dongrui
    Sun, Ninghui
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2013, 50 (05): : 1110 - 1117
  • [6] Optimizing the Linear Fascicle Evaluation Algorithm for Many-Core Systems
    Aggarwal, Karan
    Bondhugula, Uday
    INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019), 2019, : 425 - 437
  • [7] Scaling and optimizing the Gysela code on a cluster of many-core processors
    Latu, Guillaume
    Asahi, Yuuichi
    Bigot, Julien
    Feher, Tamas
    Grandgirard, Virginie
    2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 466 - 473
  • [8] Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures
    Lin, Pei-Hung
    Yi, Qing
    Quinlan, Daniel
    Liao, Chunhua
    Yan, Yongqing
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2016, 2017, 10136 : 137 - 152
  • [9] Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture
    Garcia, Elkin
    Arteaga, Jaime
    Pavel, Robert
    Gao, Guang R.
    LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2013, 2014, 8664 : 237 - 251
  • [10] Fast Data Delivery for Many-Core Processors
    Bakhshalipour, Mohammad
    Lotfi-Kamran, Pejman
    Mazloumi, Abbas
    Samandi, Farid
    Naderan-Tahan, Mahmood
    Modarressi, Mehdi
    Sarbazi-Azad, Hamid
    IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (10) : 1416 - 1429