Automating and Optimizing Data Transfers for Many-core Coprocessors

被引：0

作者：

Ren, Bin ^{[1
]}

Ravi, Nishkam ^{[2
]}

Yang, Yi ^{[2
]}

Feng, Min ^{[2
]}

Agrawal, Gagan ^{[1
]}

Chakradhar, Srimat ^{[2
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

[2] NEC Labs Amer, Princeton, NJ USA

来源：

PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON SUPERCOMPUTING, (ICS'14) | 2014年

关键词：

Coprocessors; Static Analysis; Runtime Analysis; Offloading;

D O I：

10.1145/2597652.2600114

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Orchestrating data transfers between CPUs and a coprocessor manually is cumbersome, particularly for multi-dimensional arrays and other data structures with multi-level pointers, which are common in scientific computations. This work describes a system that includes both compile-time and runtime solutions for this problem, with the overarching goal of improving programmer productivity while maintaining performance. We implemented our best compile-time solution, partial linearization with pointer reset, as a source-to-source transformation, and evaluated our work by multiple C benchmarks. Our experiment results demonstrate that our best compile-time solution can perform 2.5x-5x faster than original runtime solution, and the CPU-Coprocessor code with it can achieve 1.5x-2.5x speedup over the 16-thread CPU version.

引用

页码：177 / 177

页数：1

共 50 条

[1] A Skew-Insensitive Hashing Sync and Construction Scheme for Many-Core Coprocessors
Zhou, Kailai
Chen, Hong
Sun, Hui
Li, Cuiping
2016 IEEE TRUSTCOM/BIGDATASE/ISPA, 2016, : 1334 - 1341
[2] Optimizing the gravitational tree algorithm for many-core processors
Tokuue, Tomoyuki
Ishiyama, Tomoaki
MONTHLY NOTICES OF THE ROYAL ASTRONOMICAL SOCIETY, 2024, 528 (01) : 821 - 832
[3] Optimizing Streaming Parallelism on Heterogeneous Many-Core Architectures
Zhang, Peng
Fang, Jianbin
Yang, Canqun
Huang, Chun
Tang, Tao
Wang, Zheng
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (08) : 1878 - 1896
[4] Characterizing and Optimizing Transformer Inference on ARM Many-core Processor
Jiang, Jiazhi
Du, Jiangsu
Huang, Dan
Li, Dongsheng
Zheng, Jiang
Lu, Yutong
51ST INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, ICPP 2022, 2022,
[5] Parallel simulation of many-core processor and many-core clusters
Lü, Huiwei
Cheng, Yuan
Bai, Lu
Chen, Mingyu
Fan, Dongrui
Sun, Ninghui
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2013, 50 (05): : 1110 - 1117
[6] Optimizing the Linear Fascicle Evaluation Algorithm for Many-Core Systems
Aggarwal, Karan
Bondhugula, Uday
INTERNATIONAL CONFERENCE ON SUPERCOMPUTING (ICS 2019), 2019, : 425 - 437
[7] Scaling and optimizing the Gysela code on a cluster of many-core processors
Latu, Guillaume
Asahi, Yuuichi
Bigot, Julien
Feher, Tamas
Grandgirard, Virginie
2018 30TH INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE AND HIGH PERFORMANCE COMPUTING (SBAC-PAD 2018), 2018, : 466 - 473
[8] Automatically Optimizing Stencil Computations on Many-Core NUMA Architectures
Lin, Pei-Hung
Yi, Qing
Quinlan, Daniel
Liao, Chunhua
Yan, Yongqing
LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2016, 2017, 10136 : 137 - 152
[9] Optimizing the LU Factorization for Energy Efficiency on a Many-Core Architecture
Garcia, Elkin
Arteaga, Jaime
Pavel, Robert
Gao, Guang R.
LANGUAGES AND COMPILERS FOR PARALLEL COMPUTING, LCPC 2013, 2014, 8664 : 237 - 251
[10] Fast Data Delivery for Many-Core Processors
Bakhshalipour, Mohammad
Lotfi-Kamran, Pejman
Mazloumi, Abbas
Samandi, Farid
Naderan-Tahan, Mahmood
Modarressi, Mehdi
Sarbazi-Azad, Hamid
IEEE TRANSACTIONS ON COMPUTERS, 2018, 67 (10) : 1416 - 1429

← 1 2 3 4 5 →