Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures

被引：7

作者：

Yin, Shouyi ^{[1
]}

Lin, Xinhan ^{[1
]}

Liu, Leibo ^{[2
]}

Wei, Shaojun ^{[1
]}

机构：

[1] Tsinghua Univ, Inst Microelect, Beijing, Peoples R China

[2] Tsinghua Univ, Inst Microelect, Natl Lab Informat Sci & Technol, Beijing, Peoples R China

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2016年 / 27卷 / 11期

关键词：

CGRA; software pipelining; imperfect nested loop; sibling inner loops; outer-level pipelining; kernel compression;

D O I：

10.1109/TPDS.2016.2531678

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Coarse-grained reconfigurable architecture (CGRA) is a promising parallel computing platform that provides high performance, high power efficiency and flexibility. However, for imperfect nested loops, the existing loop mapping methods often result in low execution performance and poor hardware utilization. To tackle this problem, this paper makes three contributions: 1) a highly effective and general approach to map imperfect loops on CGRA; 2) a global optimization strategy to search the optimal initiation intervals (IIs); 3) a powerful kernel compression method to reduce the oversized kernel. Experiment results show that our approach can reduce the total computing latency by 20.5, 58.5 and 73.2 percent compared to the state-of-the-art approaches on 2 x 2, 4 x 4 and 8 x 8 CGRA respectively. Moreover, the compilation time and configuration context size is acceptable in practice.

引用

页码：3199 / 3213

页数：15

共 50 条

[1] Exploiting Parallelism of Imperfect Nested Loops with Sibling Inner Loops on Coarse-Grained Reconfigurable Architectures
Lin, Xinhan
Yin, Shouyi
Liu, Leibo
Wei, Shaojun
2016 21ST ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2016, : 456 - 461
[2] Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures
Liu, Dajiang
Yin, Shouyi
Liu, Leibo
Wei, Shaojun
2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 32 - 32
[3] Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures
Sim, Hyeonuk
Lee, Hongsik
Seo, Seongseok
Lee, Jongeun
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2016, 35 (07) : 1092 - 1104
[4] Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling
Mei, BF
Vernalde, S
Verkest, D
De Man, H
Lauwereins, R
DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, PROCEEDINGS, 2003, : 296 - 301
[5] Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling
Mei, B
Vernalde, S
Verkest, D
De Man, H
Lauwereins, R
IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, 2003, 150 (05): : 255 - 261
[6] Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures
Yin, Shouyi
Liu, Dajiang
Peng, Yu
Liu, Leibo
Wei, Shaojun
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (02) : 507 - 520
[7] An algorithm for mapping loops onto coarse-grained reconfigurable architectures
Lee, JE
Choi, K
Dutt, ND
ACM SIGPLAN NOTICES, 2003, 38 (07) : 183 - 188
[8] Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures
Liu, Dajiang
Yin, Shouyi
Peng, Yu
Liu, Leibo
Wei, Shaojun
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2015, 23 (11) : 2581 - 2594
[9] Dependence analysis and extraction of coarse-grained parallelism for parameterized perfectly-nested loops
Bielecki, Wlodzimierz
Kraska, Krzysztof
Poliwoda, Maciej
PRZEGLAD ELEKTROTECHNICZNY, 2012, 88 (10B): : 231 - 234
[10] Extracting Coarse-Grained Parallelism for Affine Perfectly Nested Quasi-uniform Loops
Bielecki, Wlodzimierz
Kraska, Krzysztof
PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2012, 7203 : 307 - 316

← 1 2 3 4 5 →