Unified Buffer: Compiling Image Processing and Machine Learning Applications to Push-Memory Accelerators

被引：8

作者：

Liu, Qiaoyi ^{[1
]}

Setter, Jeff ^{[1
]}

Huff, Dillon ^{[1
]}

Strange, Maxwell ^{[1
]}

Feng, Kathleen ^{[1
]}

Horowitz, Mark ^{[1
]}

Raina, Priyanka ^{[1
]}

Kjolstad, Fredrik ^{[1
]}

机构：

[1] Gates Comp Sci, 353 Serra Mall, Stanford, CA 94305 USA

来源：

ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION | 2023年 / 20卷 / 02期

关键词：

Hardware accelerators; memory abstraction; polyhedral analysis; machine learning; LANGUAGE;

D O I：

10.1145/3572908

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Image processing and machine learning applications benefit tremendously from hardware acceleration. Existing compilers target either FPGAs, which sacrifice power and performance for programmability, or ASICs, which become obsolete as applications change. Programmable domain-specific accelerators, such as coarse-grained reconfigurable arrays (CGRAs), have emerged as a promising middle-ground, but they have traditionally been difficult compiler targets since they use a different memory abstraction. In contrast to CPUs and GPUs, the memory hierarchies of domain-specific accelerators use push memories: memories that send input data streams to computation kernels or to higher or lower levels in the memory hierarchy and store the resulting output data streams. To address the compilation challenge caused by push memories, we propose that the representation of these memories in the compiler be altered to directly represent them by combining storage with address generation and control logic in a single structure-a unified buffer. The unified buffer abstraction enables the compiler to separate generic push memory optimizations from the mapping to specific memory implementations in the backend. This separation allows our compiler to map high-level Halide applications to different CGRA memory designs, including some with a ready-valid interface. The separation also opens the opportunity for optimizing push memory elements on reconfigurable arrays. Our optimized memory implementation, the Physical Unified Buffer, uses a wide-fetch, single-port SRAM macro with built-in address generation logic to implement a buffer with two read and two write ports. It is 18% smaller and consumes 31% less energy than a physical buffer implementation using a dual-port memory that only supports two ports. Finally, our system evaluation shows that enabling a compiler to support CGRAs leads to performance and energy benefits. Over a wide range of image processing and machine learning applications, our CGRA achieves 4.7xbetter runtime and 3.5xbetter energy-efficiency compared to an FPGA.

引用

页数：26

共 50 条

[21] Application of Machine Learning to the Medical Image Processing
Shouno H.
Journal of the Institute of Image Electronics Engineers of Japan, 2018, 47 (04)
[22] TelaMalloc: Efficient On-Chip Memory Allocation for Production Machine Learning Accelerators
Maas, Martin
Beaugnon, Ulysse
Chauhan, Arun
Ilbeyi, Berkin
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON ARCHITECTURAL SUPPORT FOR PROGRAMMING LANGUAGES AND OPERATING SYSTEMS, VOL 1, ASPLOS 2023, 2023, : 123 - 137
[23] Circuits and Architectures for In-Memory Computing-Based Machine Learning Accelerators
Ankit, Aayush
Chakraborty, Indranil
Agrawal, Amogh
Ali, Mustafa
Roy, Kaushik
IEEE MICRO, 2020, 40 (06) : 8 - 21
[24] Machine learning applications in minerals processing: A review
McCoy, J. T.
Auret, L.
MINERALS ENGINEERING, 2019, 132 : 95 - 109
[25] Machine Learning Applications in Medical Image Analysis
El-Baz, Ayman
Gimel'farb, Georgy
Suzuki, Kenji
COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2017, 2017
[26] Machine learning applications in cell image analysis
Kan, Andrey
IMMUNOLOGY AND CELL BIOLOGY, 2017, 95 (06): : 525 - 530
[27] Image Processing: Machine Vision Applications VII Introduction
Niel, Kurt S.
Bingham, Philip R.
IMAGE PROCESSING: MACHINE VISION APPLICATIONS VII, 2014, 9024 : IX - IX
[28] Memory customisations for image processing applications targeting MPSoCs
Watson, David
Ahmadinia, Ali
INTEGRATION-THE VLSI JOURNAL, 2015, 51 : 72 - 80
[29] TeleOphta: Machine learning and image processing methods for teleophthalmology
Decenciere, E.
Cazuguel, G.
Zhang, X.
Thibault, G.
Klein, J. -C.
Meyer, F.
Marcotegui, B.
Quellec, G.
Lamard, M.
Danno, R.
Elie, D.
Massin, P.
Viktor, Z.
Erginay, A.
Lay, B.
Chabouis, A.
IRBM, 2013, 34 (02) : 196 - 203
[30] Machine Learning for Medical Image Processing and Pattern Recognition
Suzuki, K.
MEDICAL PHYSICS, 2010, 37 (06) : 3396 - +

← 1 2 3 4 5 →