Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures

被引:7
|
作者
Yin, Shouyi [1 ]
Lin, Xinhan [1 ]
Liu, Leibo [2 ]
Wei, Shaojun [1 ]
机构
[1] Tsinghua Univ, Inst Microelect, Beijing, Peoples R China
[2] Tsinghua Univ, Inst Microelect, Natl Lab Informat Sci & Technol, Beijing, Peoples R China
关键词
CGRA; software pipelining; imperfect nested loop; sibling inner loops; outer-level pipelining; kernel compression;
D O I
10.1109/TPDS.2016.2531678
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Coarse-grained reconfigurable architecture (CGRA) is a promising parallel computing platform that provides high performance, high power efficiency and flexibility. However, for imperfect nested loops, the existing loop mapping methods often result in low execution performance and poor hardware utilization. To tackle this problem, this paper makes three contributions: 1) a highly effective and general approach to map imperfect loops on CGRA; 2) a global optimization strategy to search the optimal initiation intervals (IIs); 3) a powerful kernel compression method to reduce the oversized kernel. Experiment results show that our approach can reduce the total computing latency by 20.5, 58.5 and 73.2 percent compared to the state-of-the-art approaches on 2 x 2, 4 x 4 and 8 x 8 CGRA respectively. Moreover, the compilation time and configuration context size is acceptable in practice.
引用
收藏
页码:3199 / 3213
页数:15
相关论文
共 50 条
  • [1] Exploiting Parallelism of Imperfect Nested Loops with Sibling Inner Loops on Coarse-Grained Reconfigurable Architectures
    Lin, Xinhan
    Yin, Shouyi
    Liu, Leibo
    Wei, Shaojun
    2016 21ST ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2016, : 456 - 461
  • [2] Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures
    Liu, Dajiang
    Yin, Shouyi
    Liu, Leibo
    Wei, Shaojun
    2014 IEEE 22ND ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2014), 2014, : 32 - 32
  • [3] Mapping Imperfect Loops to Coarse-Grained Reconfigurable Architectures
    Sim, Hyeonuk
    Lee, Hongsik
    Seo, Seongseok
    Lee, Jongeun
    IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2016, 35 (07) : 1092 - 1104
  • [4] Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling
    Mei, BF
    Vernalde, S
    Verkest, D
    De Man, H
    Lauwereins, R
    DESIGN, AUTOMATION AND TEST IN EUROPE CONFERENCE AND EXHIBITION, PROCEEDINGS, 2003, : 296 - 301
  • [5] Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling
    Mei, B
    Vernalde, S
    Verkest, D
    De Man, H
    Lauwereins, R
    IEE PROCEEDINGS-COMPUTERS AND DIGITAL TECHNIQUES, 2003, 150 (05): : 255 - 261
  • [6] Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures
    Yin, Shouyi
    Liu, Dajiang
    Peng, Yu
    Liu, Leibo
    Wei, Shaojun
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2016, 24 (02) : 507 - 520
  • [7] An algorithm for mapping loops onto coarse-grained reconfigurable architectures
    Lee, JE
    Choi, K
    Dutt, ND
    ACM SIGPLAN NOTICES, 2003, 38 (07) : 183 - 188
  • [8] Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures
    Liu, Dajiang
    Yin, Shouyi
    Peng, Yu
    Liu, Leibo
    Wei, Shaojun
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2015, 23 (11) : 2581 - 2594
  • [9] Dependence analysis and extraction of coarse-grained parallelism for parameterized perfectly-nested loops
    Bielecki, Wlodzimierz
    Kraska, Krzysztof
    Poliwoda, Maciej
    PRZEGLAD ELEKTROTECHNICZNY, 2012, 88 (10B): : 231 - 234
  • [10] Extracting Coarse-Grained Parallelism for Affine Perfectly Nested Quasi-uniform Loops
    Bielecki, Wlodzimierz
    Kraska, Krzysztof
    PARALLEL PROCESSING AND APPLIED MATHEMATICS, PT I, 2012, 7203 : 307 - 316