Efficient Memory Partitioning for Parallel Data Access in FPGA via Data Reuse

被引:3
|
作者
Su, Jincheng [1 ]
Yang, Fan [1 ]
Zeng, Xuan [1 ]
Zhou, Dian [2 ,3 ]
Chen, Jie [4 ]
机构
[1] Fudan Univ, Sch Microelect, State Key Lab ASIC & Syst, Shanghai 201203, Peoples R China
[2] Fudan Univ, Shanghai 201203, Peoples R China
[3] Univ Texas Dallas, Richardson, TX 75080 USA
[4] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB T6G 2R3, Canada
基金
美国国家科学基金会;
关键词
Data reuse; field-programmable gate array (FPGA); high-level synthesis (HLS); loop transformation; memory partition; AFFINE SCHEDULING PROBLEM; OPTIMIZATION;
D O I
10.1109/TCAD.2017.2648838
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Parallelizing the memory accesses in a nested loop is a critical challenge to facilitate loop pipelining. An effective approach for high-level synthesis on field-programmable gate array is to map these accesses to multiple on-chip memory banks using a memory partitioning technique. In this paper, we propose an efficient memory partitioning algorithm with low overhead and low time complexity for parallel data access via data reuse. We find that for most applications in image and video processing, a large amount of data can be reused among different iterations of a loop nest. Motivated by this observation, we propose to cache reusable data using on-chip registers, organized as register chains. The nonreusable data are then separated into several memory banks by a memory partitioning algorithm. We revise the existing padding method to cover cases occurring frequently in our method wherein certain components of partition vector are zeros. Experimental results have demonstrated that compared with the state-of-the-art algorithms, the proposed method is efficient in terms of execution time, resource overhead, and power consumption across a wide range of access patterns extracted from applications in image and video processing. As for the testing patterns, the execution time is typically less than one millisecond. And the number of required memory banks is reduced by 59.7% on average, which leads to an average reduction of 78.2% in look-up tables, 65.5% in flip-flops, 37.1% in DSP48Es, and therefore 74.8% reduction in dynamic power consumption. Moreover, the storage overhead incurred by the proposed method is zero for most widely used access patterns in image filtering.
引用
收藏
页码:1674 / 1687
页数:14
相关论文
共 50 条
  • [1] Efficient Memory Partitioning for Parallel Data Access via Data Reuse
    Su, Jincheng
    Yang, Fan
    Zeng, Xuan
    Zhou, Dian
    [J]. PROCEEDINGS OF THE 2016 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'16), 2016, : 138 - 147
  • [2] An Efficient Memory Partitioning Approach for Multi-Pattern Data Access via Data Reuse
    Li, Wensong
    Yang, Fan
    Zhu, Hengliang
    Zeng, Xuan
    Zhou, Dian
    [J]. ACM TRANSACTIONS ON RECONFIGURABLE TECHNOLOGY AND SYSTEMS, 2019, 12 (01)
  • [3] Efficient Memory Partitioning for Parallel Data Access in Multidimensional Arrays
    Meng, Chenyue
    Yin, Shouyi
    Ouyang, Peng
    Liu, Leibo
    Wei, Shaojun
    [J]. 2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2015,
  • [4] Memory Partitioning for Parallel Multipattern Data Access in Multiple Data Arrays
    Yin, Shouyi
    Xie, Zhicong
    Meng, Chenyue
    Ouyang, Peng
    Liu, Leibo
    Wei, Shaojun
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2018, 37 (02) : 431 - 444
  • [5] Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs
    Li, Cheng
    Gu, Jiangyuan
    Yin, Shouyi
    Liu, Leibo
    Wei, Shaojun
    [J]. 2021 26TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2021, : 204 - 209
  • [6] DARIC: A Data Reuse-Friendly CGRA for Parallel Data Access via Elastic FIFOs
    Liu, Dajiang
    Mou, Di
    Zhu, Rong
    Zhuang, Yan
    Shang, Jiaxing
    Zhong, Jiang
    Yin, Shouyi
    [J]. 2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [7] Disturbance Aware Memory Partitioning for Parallel Data Access in STT-RAM
    Yin, Shouyi
    Xie, Zhicong
    Wei, Shaojun
    [J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [8] An Efficient Data Reuse Strategy for Multi-Pattern Data Access
    Li, Wensong
    Yang, Fan
    Zhu, Hengliang
    Zeng, Xuan
    Zhou, Dian
    [J]. 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD) DIGEST OF TECHNICAL PAPERS, 2018,
  • [9] Multibank Memory Optimization for Parallel Data Access in Multiple Data Arrays
    Yin, Shouyi
    Xie, Zhicong
    Meng, Chenyue
    Liu, Leibo
    Wei, Shaojun
    [J]. 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2016,
  • [10] An Efficient Memory Partitioning Approach for Multi-Pattern Data Access in STT-RAM
    Liu, Binbin
    Yang, Fan
    Zhou, Dian
    Zeng, Xuan
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2020,