Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

被引:8
|
作者
Liu, Qiaoyi [1 ]
Setter, Jeff [1 ]
Huff, Dillon [1 ]
Strange, Maxwell [1 ]
Feng, Kathleen [1 ]
Horowitz, Mark [1 ]
Raina, Priyanka [1 ]
Kjolstad, Fredrik [1 ]
机构
[1] Gates Comp Sci, 353 Serra Mall, Stanford, CA 94305 USA
关键词
Hardware accelerators; memory abstraction; polyhedral analysis; machine learning; LANGUAGE;
D O I
10.1145/3572908
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Image processing and machine learning applications benefit tremendously from hardware acceleration. Existing compilers target either FPGAs, which sacrifice power and performance for programmability, or ASICs, which become obsolete as applications change. Programmable domain-specific accelerators, such as coarse-grained reconfigurable arrays (CGRAs), have emerged as a promising middle-ground, but they have traditionally been difficult compiler targets since they use a different memory abstraction. In contrast to CPUs and GPUs, the memory hierarchies of domain-specific accelerators use push memories: memories that send input data streams to computation kernels or to higher or lower levels in the memory hierarchy and store the resulting output data streams. To address the compilation challenge caused by push memories, we propose that the representation of these memories in the compiler be altered to directly represent them by combining storage with address generation and control logic in a single structure-a unified buffer. The unified buffer abstraction enables the compiler to separate generic push memory optimizations from the mapping to specific memory implementations in the backend. This separation allows our compiler to map high-level Halide applications to different CGRA memory designs, including some with a ready-valid interface. The separation also opens the opportunity for optimizing push memory elements on reconfigurable arrays. Our optimized memory implementation, the Physical Unified Buffer, uses a wide-fetch, single-port SRAM macro with built-in address generation logic to implement a buffer with two read and two write ports. It is 18% smaller and consumes 31% less energy than a physical buffer implementation using a dual-port memory that only supports two ports. Finally, our system evaluation shows that enabling a compiler to support CGRAs leads to performance and energy benefits. Over a wide range of image processing and machine learning applications, our CGRA achieves 4.7xbetter runtime and 3.5xbetter energy-efficiency compared to an FPGA.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Compiling image processing applications to reconfigurable hardware
    Rinker, R
    Hammes, J
    Najjar, WA
    Böhm, W
    Draper, B
    IEEE INTERNATIONAL CONFERENCE ON APPLICATION-SPECIFIC SYSTEMS, ARCHITECTURES, AND PROCESSORS, PROCEEDINGS, 2000, : 56 - 65
  • [2] Applications of artificial intelligence and machine learning in image processing
    Xu, Pingyuan
    Wang, Jinyuan
    Jiang, Yu
    Gong, Xiangbing
    FRONTIERS IN MATERIALS, 2024, 11
  • [3] Processing-in-Memory Accelerators Toward Efficient Real-World Machine Learning
    Kim, Bokyung
    2024 13TH NON-VOLATILE MEMORY SYSTEMS AND APPLICATIONS SYMPOSIUM, NVMSA 2024, 2024, : 77 - 78
  • [4] Modular Sumcheck Proofs with Applications to Machine Learning and Image Processing
    Balbas, David
    Fiore, Dario
    Gonzalez Vasco, Maria Isabel
    Robissout, Damien
    Soriente, Claudio
    PROCEEDINGS OF THE 2023 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, CCS 2023, 2023, : 1437 - 1451
  • [5] Machine Learning in Image Processing
    Lezoray, Olivier
    Charrier, Christophe
    Cardot, Hubert
    Lefevre, Sebastien
    EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2008, 2008 (1)
  • [6] Machine Learning in Image Processing
    Olivier Lézoray
    Christophe Charrier
    Hubert Cardot
    Sébastien Lefèvre
    EURASIP Journal on Advances in Signal Processing, 2008
  • [7] Enhancing Reliability of Emerging Memory Technology for Machine Learning Accelerators
    Jasemi, Masoomeh
    Hessabi, Shaahin
    Bagherzadeh, Nader
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (04) : 2234 - 2240
  • [8] Memory Devices and A/D Interfaces: Design Tradeoffs in Mixed-Signal Accelerators for Machine Learning Applications
    Caselli, Michele
    Debacker, Peter
    Boni, Andrea
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2022, 69 (07) : 3084 - 3089
  • [9] Machine learning in intelligent image processing
    Tao, Dacheng
    Wang, Dianhui
    Murtagh, Fionn
    SIGNAL PROCESSING, 2013, 93 (06) : 1399 - 1400
  • [10] Nonlocal infinity Laplacian equation on graphs with applications in image processing and machine learning
    Abderrahim, Elmoataz
    Xavier, Desquesnes
    Zakaria, Lakhdari
    Olivier, Lezoray
    MATHEMATICS AND COMPUTERS IN SIMULATION, 2014, 102 : 153 - 163