Reducing Memory Access Conflicts with Loop Transformation and Data Reuse on Coarse-grained Reconfigurable Architecture

被引：3

作者：

Chen, Yuge ^{[1
]}

Zhao, Zhongyuan ^{[1
,2
]}

Jiang, Jianfei ^{[1
]}

He, Guanghui ^{[1
]}

Mao, Zhigang ^{[1
]}

Sheng, Weiguang ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Micro Nallo Elect, Shanghai, Peoples R China

[2] Cornell Univ, Sch Elect & Comp Engn, Ithaca, NY 14850 USA

来源：

PROCEEDINGS OF THE 2021 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2021) | 2021年

关键词：

CGRA; multi-bank memory; data reuse; spatial mapping;

D O I：

10.23919/DATE51398.2021.9473971

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Coarse-Grained Reconfigurable Arrays (CGRAs) are promising to have low power consumption and high energy-efficiency characteristics as accelerators. Recent years, many research works focus on improving the programmability of the CGRAs by enabling the fast reconfiguration during execution. The performance of these CGRAs critically hinges upon the scheduling power of the compiler. One of the critical challenges is to reduce memory access conflicts using static compilation techniques. Memory accessing conflict brings the synchronization overhead which causes the pipelining stall and reduces CGRA performance. Existing compilers usually tackle this challenge by orchestrating the data placement of the on-chip global memory (OGM) in CGRA to let the parallel memory accesses avoid the bank conflict. However, we find bank conflict is not the only reason that causes the memory access conflicts. In some CGRAs, the bandwidth of the data network between OGM and processing element array (PEA) is also limited due to the low power design principle. The unbalanced network bandwidth loads is another reason that causes memory access conflicts. Furthermore, the redundant data access across iterations is one of the primary causes of memory access conflicts. Based on these observations, we provide a comprehensive and generalized compilation flow to reduce the memory conflicts. Firstly, we develop a loop transformation model to maximize the inter-iteration data reuse of the loops to reduce the memory accessing operations under the software pipelining scheme. Secondly, we enhance the bandwidth utilization of the network between OGM and PEA and avoid the bank conflict by providing a conflict-aware spatial mapping algorithm which can be easily integrated into existing CGRA modulo scheduling compilation flow. Experimental results show our method is capable of improving performance by an average of 44% comparing with state-of-the-art CGRA compiling flow.

引用

页码：124 / 129

页数：6

共 50 条

[21] Designing a coarse-grained reconfigurable architecture using loop self-pipelining
Xu, Jinhui
Wu, Guiming
Dou, Yong
Dong, Yazhuo
ADVANCES IN COMPUTER SYSTEMS ARCHITECTURE, PROCEEDINGS, 2006, 4186 : 567 - 573
[22] Alleviating the data memory bandwidth bottleneck in coarse-grained reconfigurable arrays
Dimitroulakos, G
Galanis, MD
Goutis, CE
16TH INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURE AND PROCESSORS, PROCEEDINGS, 2005, : 161 - 168
[23] Register file architecture optimization in a coarse-grained reconfigurable architecture
Kwok, Z
Wilton, SJE
FCCM 2005: 13TH ANNUAL IEEE SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES, PROCEEDINGS, 2005, : 35 - 44
[24] Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays
Yang Chen
Liu LeiBo
Yin ShouYi
Wei ShaoJun
SCIENCE CHINA-PHYSICS MECHANICS & ASTRONOMY, 2014, 57 (12) : 2214 - 2227
[25] Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays
YANG Chen
LIU Lei Bo
YIN Shou Yi
WEI Shao Jun
Science China(Physics,Mechanics & Astronomy), 2014, Mechanics & Astronomy)2014 (12) : 2214 - 2227
[26] Efficient and flexible memory architecture to alleviate data and context bandwidth bottlenecks of coarse-grained reconfigurable arrays
Chen Yang
LeiBo Liu
ShouYi Yin
ShaoJun Wei
Science China Physics, Mechanics & Astronomy, 2014, 57 : 2214 - 2227
[27] COARSE-GRAINED DYNAMICALLY RECONFIGURABLE ARCHITECTURE WITH FLEXIBLE RELIABILITY
Alnajjar, Dawood
Ko, Younghun
Imagawa, Takashi
Konoura, Hiroaki
Hiromoto, Masayuki
Mitsuyama, Yukio
Hashimoto, Masanori
Ochi, Hiroyuki
Onoye, Takao
FPL: 2009 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE LOGIC AND APPLICATIONS, 2009, : 186 - +
[28] A coarse-grained reconfigurable architecture supporting flexible execution
Hironaka, T
Fukuda, T
Goto, Y
Tanigawa, K
Kawasaki, T
Kojima, A
SEVENTH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND GRID IN ASIA PACIFIC REGION, PROCEEDINGS, 2004, : 448 - 449
[29] A New Array Fabric for Coarse-Grained Reconfigurable Architecture
Kim, Yoonjin
Mahapatra, Rabi N.
11TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN - ARCHITECTURES, METHODS AND TOOLS : DSD 2008, PROCEEDINGS, 2008, : 584 - 591
[30] Design and Analysis of Layered Coarse-Grained Reconfigurable Architecture
Rakossy, Zoltan Endre
Naphade, Tejas
Chattopadhyay, Anupam
2012 INTERNATIONAL CONFERENCE ON RECONFIGURABLE COMPUTING AND FPGAS (RECONFIG), 2012,

← 1 2 3 4 5 →