Reducing Memory Access Conflicts with Loop Transformation and Data Reuse on Coarse-grained Reconfigurable Architecture

被引:3
|
作者
Chen, Yuge [1 ]
Zhao, Zhongyuan [1 ,2 ]
Jiang, Jianfei [1 ]
He, Guanghui [1 ]
Mao, Zhigang [1 ]
Sheng, Weiguang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Micro Nallo Elect, Shanghai, Peoples R China
[2] Cornell Univ, Sch Elect & Comp Engn, Ithaca, NY 14850 USA
关键词
CGRA; multi-bank memory; data reuse; spatial mapping;
D O I
10.23919/DATE51398.2021.9473971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Coarse-Grained Reconfigurable Arrays (CGRAs) are promising to have low power consumption and high energy-efficiency characteristics as accelerators. Recent years, many research works focus on improving the programmability of the CGRAs by enabling the fast reconfiguration during execution. The performance of these CGRAs critically hinges upon the scheduling power of the compiler. One of the critical challenges is to reduce memory access conflicts using static compilation techniques. Memory accessing conflict brings the synchronization overhead which causes the pipelining stall and reduces CGRA performance. Existing compilers usually tackle this challenge by orchestrating the data placement of the on-chip global memory (OGM) in CGRA to let the parallel memory accesses avoid the bank conflict. However, we find bank conflict is not the only reason that causes the memory access conflicts. In some CGRAs, the bandwidth of the data network between OGM and processing element array (PEA) is also limited due to the low power design principle. The unbalanced network bandwidth loads is another reason that causes memory access conflicts. Furthermore, the redundant data access across iterations is one of the primary causes of memory access conflicts. Based on these observations, we provide a comprehensive and generalized compilation flow to reduce the memory conflicts. Firstly, we develop a loop transformation model to maximize the inter-iteration data reuse of the loops to reduce the memory accessing operations under the software pipelining scheme. Secondly, we enhance the bandwidth utilization of the network between OGM and PEA and avoid the bank conflict by providing a conflict-aware spatial mapping algorithm which can be easily integrated into existing CGRA modulo scheduling compilation flow. Experimental results show our method is capable of improving performance by an average of 44% comparing with state-of-the-art CGRA compiling flow.
引用
收藏
页码:124 / 129
页数:6
相关论文
共 50 条
  • [1] Reducing Configuration Contexts for Coarse-grained Reconfigurable Architecture
    Yin, Shouyi
    Yin, Chongyong
    Liu, Leibo
    Zhu, Min
    Wang, Yansheng
    Wei, Shaojun
    2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012), 2012, : 121 - 124
  • [2] Joint Loop Mapping and Data Placement for Coarse-Grained Reconfigurable Architecture with Multi-Bank Memory
    Yin, Shouyi
    Yao, Xianqing
    Lu, Tianyi
    Liu, Leibo
    Wei, Shaojun
    2016 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2016,
  • [3] Critical loop memory-aware mapping onto coarse-grained reconfigurable architecture
    Yang, Ziyu
    Zhao, Peng
    Wang, Dawei
    Li, Sikun
    Guofang Keji Daxue Xuebao/Journal of National University of Defense Technology, 2012, 34 (06): : 46 - 53
  • [4] A Reconfigurable Memory Architecture for System Integration of Coarse-Grained Reconfigurable Arrays
    Sousa, Ericles
    Tanase, Alexandru
    Hannig, Frank
    Teich, Juergen
    2017 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2017,
  • [5] MapReduce inspired loop mapping for coarse-grained reconfigurable architecture
    Yin ShouYi
    Shao ShengJia
    Liu LeiBo
    Wei ShaoJun
    SCIENCE CHINA-INFORMATION SCIENCES, 2014, 57 (12) : 1 - 14
  • [6] MapReduce inspired loop mapping for coarse-grained reconfigurable architecture
    ShouYi Yin
    ShengJia Shao
    LeiBo Liu
    ShaoJun Wei
    Science China Information Sciences, 2014, 57 : 1 - 14
  • [7] The Organization of On-Chip Data Memory in One Coarse-Grained Reconfigurable Architecture
    Wang, Yansheng
    Liu, Leibo
    Yin, Shouyi
    Zhu, Min
    Cao, Peng
    Yang, Jun
    Wei, Shaojun
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2013, E96A (11) : 2218 - 2229
  • [8] A Data Prefetch and Reuse Strategy for Coarse-Grained Reconfigurable Architectures
    Ge, Wei
    Qi, Zhi
    Du, Yue
    Ma, Lu
    Shi, Longxing
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2013, E96D (03): : 616 - 623
  • [9] Coarse Grained Reconfigurable Architecture Loop Mapping Algorithm Based on Memory Partitioning and Path Reuse
    Zhang Xingming
    Yuan Kaijian
    Gao Yanzhao
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2018, 40 (06) : 1520 - 1524
  • [10] Memory Access Optimization in Compilation for Coarse-Grained Reconfigurable Architectures
    Kim, Yongjoo
    Lee, Jongeun
    Shrivastava, Aviral
    Paek, Yunheung
    ACM TRANSACTIONS ON DESIGN AUTOMATION OF ELECTRONIC SYSTEMS, 2011, 16 (04)