Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

被引:8
|
作者
Liu, Qiaoyi [1 ]
Setter, Jeff [1 ]
Huff, Dillon [1 ]
Strange, Maxwell [1 ]
Feng, Kathleen [1 ]
Horowitz, Mark [1 ]
Raina, Priyanka [1 ]
Kjolstad, Fredrik [1 ]
机构
[1] Gates Comp Sci, 353 Serra Mall, Stanford, CA 94305 USA
关键词
Hardware accelerators; memory abstraction; polyhedral analysis; machine learning; LANGUAGE;
D O I
10.1145/3572908
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Image processing and machine learning applications benefit tremendously from hardware acceleration. Existing compilers target either FPGAs, which sacrifice power and performance for programmability, or ASICs, which become obsolete as applications change. Programmable domain-specific accelerators, such as coarse-grained reconfigurable arrays (CGRAs), have emerged as a promising middle-ground, but they have traditionally been difficult compiler targets since they use a different memory abstraction. In contrast to CPUs and GPUs, the memory hierarchies of domain-specific accelerators use push memories: memories that send input data streams to computation kernels or to higher or lower levels in the memory hierarchy and store the resulting output data streams. To address the compilation challenge caused by push memories, we propose that the representation of these memories in the compiler be altered to directly represent them by combining storage with address generation and control logic in a single structure-a unified buffer. The unified buffer abstraction enables the compiler to separate generic push memory optimizations from the mapping to specific memory implementations in the backend. This separation allows our compiler to map high-level Halide applications to different CGRA memory designs, including some with a ready-valid interface. The separation also opens the opportunity for optimizing push memory elements on reconfigurable arrays. Our optimized memory implementation, the Physical Unified Buffer, uses a wide-fetch, single-port SRAM macro with built-in address generation logic to implement a buffer with two read and two write ports. It is 18% smaller and consumes 31% less energy than a physical buffer implementation using a dual-port memory that only supports two ports. Finally, our system evaluation shows that enabling a compiler to support CGRAs leads to performance and energy benefits. Over a wide range of image processing and machine learning applications, our CGRA achieves 4.7xbetter runtime and 3.5xbetter energy-efficiency compared to an FPGA.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] Machine Learning Applications in Electromagnetics and Antenna Array Processing
    Chu, James
    IEEE MICROWAVE MAGAZINE, 2022, 23 (01) : 68 - 69
  • [42] Signal Processing and Machine Learning for Smart Sensing Applications
    Chien, Ying-Ren
    Zhou, Mu
    Peng, Ao
    Zhu, Ni
    Torres-Sospedra, Joaquin
    SENSORS, 2023, 23 (03)
  • [43] Machine Learning Applications in Electromagnetics and Antenna Array Processing
    Martinez-Ramon, Manel
    Gupta, Arjun
    Rojo-Alvarez, Jose Luis
    Christodoulou, Christos
    MICROWAVE JOURNAL, 2021, 64 (06) : 94 - 94
  • [44] Machine Learning Applications in Electromagnetics and Antenna Array Processing
    Cools, Kristof
    IEEE ANTENNAS AND PROPAGATION MAGAZINE, 2022, 64 (04) : 178 - 179
  • [45] ImaGen: A General Framework for Generating Memory- and Power-Efficient Image Processing Accelerators
    Ujjainkar, Nisarg
    Leng, Jingwen
    Zhu, Yuhao
    PROCEEDINGS OF THE 2023 THE 50TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE, ISCA 2023, 2023, : 579 - 591
  • [46] PARALLEL PROCESSING ARCHITECTURE FOR THE EFFICIENT USE OF MEMORY IN IMAGE-PROCESSING APPLICATIONS
    FARUQUE, A
    FONG, DYS
    BRAY, DW
    OPTICAL ENGINEERING, 1991, 30 (07) : 994 - 1004
  • [47] Design Space and Memory Technology Co-exploration for In-Memory Computing Based Machine Learning Accelerators
    He, Kang
    Chakraborty, Indranil
    Wang, Cheng
    Roy, Kaushik
    2022 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD, 2022,
  • [48] Machine Learning in Radiology: Applications Beyond Image Interpretation
    Lakhani, Paras
    Prater, Adam B.
    Hutson, R. Kent
    Andriole, Kathy P.
    Dreyer, Keith J.
    Morey, Jose
    Prevedello, Luciano M.
    Clark, Toshi J.
    Geis, J. Raymond
    Itri, Jason N.
    Hawkins, C. Matthew
    JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2018, 15 (02) : 350 - 359
  • [49] Dry fruit image dataset for machine learning applications
    Meshram, Vishal
    Choudhary, Chetan
    Kale, Atharva
    Rajput, Jaideep
    Meshram, Vidula
    Dhumane, Amol
    DATA IN BRIEF, 2023, 49
  • [50] Evaluation of computing in memory architectures for digital image processing applications
    Pennsylvania State Univ, University Park, United States
    Proc IEEE Int Conf Comput Des VLSI Comput Process, (146-151):