Reducing Memory Access Conflicts with Loop Transformation and Data Reuse on Coarse-grained Reconfigurable Architecture

被引:3
|
作者
Chen, Yuge [1 ]
Zhao, Zhongyuan [1 ,2 ]
Jiang, Jianfei [1 ]
He, Guanghui [1 ]
Mao, Zhigang [1 ]
Sheng, Weiguang [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Micro Nallo Elect, Shanghai, Peoples R China
[2] Cornell Univ, Sch Elect & Comp Engn, Ithaca, NY 14850 USA
关键词
CGRA; multi-bank memory; data reuse; spatial mapping;
D O I
10.23919/DATE51398.2021.9473971
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Coarse-Grained Reconfigurable Arrays (CGRAs) are promising to have low power consumption and high energy-efficiency characteristics as accelerators. Recent years, many research works focus on improving the programmability of the CGRAs by enabling the fast reconfiguration during execution. The performance of these CGRAs critically hinges upon the scheduling power of the compiler. One of the critical challenges is to reduce memory access conflicts using static compilation techniques. Memory accessing conflict brings the synchronization overhead which causes the pipelining stall and reduces CGRA performance. Existing compilers usually tackle this challenge by orchestrating the data placement of the on-chip global memory (OGM) in CGRA to let the parallel memory accesses avoid the bank conflict. However, we find bank conflict is not the only reason that causes the memory access conflicts. In some CGRAs, the bandwidth of the data network between OGM and processing element array (PEA) is also limited due to the low power design principle. The unbalanced network bandwidth loads is another reason that causes memory access conflicts. Furthermore, the redundant data access across iterations is one of the primary causes of memory access conflicts. Based on these observations, we provide a comprehensive and generalized compilation flow to reduce the memory conflicts. Firstly, we develop a loop transformation model to maximize the inter-iteration data reuse of the loops to reduce the memory accessing operations under the software pipelining scheme. Secondly, we enhance the bandwidth utilization of the network between OGM and PEA and avoid the bank conflict by providing a conflict-aware spatial mapping algorithm which can be easily integrated into existing CGRA modulo scheduling compilation flow. Experimental results show our method is capable of improving performance by an average of 44% comparing with state-of-the-art CGRA compiling flow.
引用
收藏
页码:124 / 129
页数:6
相关论文
共 50 条
  • [21] Designing a coarse-grained reconfigurable architecture using loop self-pipelining
    Xu, Jinhui
    Wu, Guiming
    Dou, Yong
    Dong, Yazhuo
    ADVANCES IN COMPUTER SYSTEMS ARCHITECTURE, PROCEEDINGS, 2006, 4186 : 567 - 573
  • [22] Alleviating the data memory bandwidth bottleneck in coarse-grained reconfigurable arrays
    Dimitroulakos, G
    Galanis, MD
    Goutis, CE
    16TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURE AND PROCESSORS, PROCEEDINGS, 2005, : 161 - 168
  • [23] Register file architecture optimization in a coarse-grained reconfigurable architecture
    Kwok, Z
    Wilton, SJE
    FCCM 2005: 13TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2005, : 35 - 44
  • [24] Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays
    Yang Chen
    Liu LeiBo
    Yin ShouYi
    Wei ShaoJun
    SCIENCE CHINA-PHYSICS MECHANICS & ASTRONOMY, 2014, 57 (12) : 2214 - 2227
  • [25] Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays
    YANG Chen
    LIU Lei Bo
    YIN Shou Yi
    WEI Shao Jun
    Science China(Physics,Mechanics & Astronomy), 2014, Mechanics & Astronomy)2014 (12) : 2214 - 2227
  • [26] Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays
    Chen Yang
    LeiBo Liu
    ShouYi Yin
    ShaoJun Wei
    Science China Physics, Mechanics & Astronomy, 2014, 57 : 2214 - 2227
  • [27] COARSE-GRAINED DYNAMICALLY RECONFIGURABLE ARCHITECTURE WITH FLEXIBLE RELIABILITY
    Alnajjar, Dawood
    Ko, Younghun
    Imagawa, Takashi
    Konoura, Hiroaki
    Hiromoto, Masayuki
    Mitsuyama, Yukio
    Hashimoto, Masanori
    Ochi, Hiroyuki
    Onoye, Takao
    FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 186 - +
  • [28] A coarse-grained reconfigurable architecture supporting flexible execution
    Hironaka, T
    Fukuda, T
    Goto, Y
    Tanigawa, K
    Kawasaki, T
    Kojima, A
    SEVENTH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND GRID IN ASIA PACIFIC REGION, PROCEEDINGS, 2004, : 448 - 449
  • [29] A New Array Fabric for Coarse-Grained Reconfigurable Architecture
    Kim, Yoonjin
    Mahapatra, Rabi N.
    11TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN - ARCHITECTURES, METHODS AND TOOLS : DSD 2008, PROCEEDINGS, 2008, : 584 - 591
  • [30] Design and Analysis of Layered Coarse-Grained Reconfigurable Architecture
    Rakossy, Zoltan Endre
    Naphade, Tejas
    Chattopadhyay, Anupam
    2012 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2012,