FTDL: A Tailored FPGA-Overlay for Deep Learning with High Scalability

被引:8
|
作者
Shi, Runbin [1 ]
Ding, Yuhao [1 ]
Wei, Xuechao [2 ]
Li, He [3 ]
Liu, Hang [4 ]
So, Hayden K. H. [1 ]
Ding, Caiwen [5 ]
机构
[1] Univ Hong Kong, Hong Kong, Peoples R China
[2] Peking Univ, Beijing, Peoples R China
[3] Univ Cambridge, Cambridge, England
[4] Stevens Inst Technol, Hoboken, NJ 07030 USA
[5] Univ Connecticut, Storrs, CT USA
基金
美国国家科学基金会;
关键词
D O I
10.1109/dac18072.2020.9218581
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Fast inference is of paramount value to a wide range of deep learning applications. This work presents FTDL, a highly-scalable FPGA overlay framework for deep learning applications, to address the architecture and hardware mismatch faced by traditional efforts. The FTDL overlay is specifically optimized for the tiled structure of FPGAs, thereby achieving post-place-and-route operating frequencies exceeding 88 % of the theoretical maximum across different devices and design scales. A flexible compilation framework efficiently schedules matrix multiply and convolution operations of large neural network inference on the overlay and achieved over 80 % hardware efficiency on average. Taking advantage of both high operating frequency and hardware efficiency, FTDL achieves 402.6 and 151.2 FPS with GoogLeNet and ResNet50 on ImageNet, respectively, while operating at a power efficiency of 27.6 GOPS/W, making it up to 7.7x higher performance and 1.9x more power-efficient than the state-of-the-art.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Architecture Exploration of Standard-Cell and FPGA-Overlay CGRAs Using the Open-Source CGRA-ME Framework
    Chin, S. Alexander
    Niu, Kuang Ping
    Walker, Matthew
    Yin, Shizhang
    Mertens, Alexander
    Lee, Jongeun
    Anderson, Jason H.
    PROCEEDINGS OF THE 2018 INTERNATIONAL SYMPOSIUM ON PHYSICAL DESIGN (ISPD'18), 2018, : 48 - 55
  • [2] Twins: 2-hop structured overlay with high scalability
    Hu, JF
    Dong, HT
    Zheng, WM
    Wang, DS
    Li, M
    COMPUTATIONAL SCIENCE - ICCS 2004, PT 1, PROCEEDINGS, 2004, 3036 : 174 - 183
  • [3] An Overlay for Rapid FPGA Debug of Machine Learning Applications
    Noronha, Daniel Holanda
    Zhao, Ruizhe
    Que, Zhiqiang
    Goeders, Jeffrey
    Luk, Wayne
    Wilton, Steve
    2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019), 2019, : 135 - 143
  • [4] High Performance Programmable FPGA Overlay for Digital Signal Processing
    McGettrick, Seamas
    Patel, Kunjan
    Bleakley, Chris
    RECONFIGURABLE COMPUTING: ARCHITECTURES, TOOLS AND APPLICATIONS, 2011, 6578 : 375 - 384
  • [5] High-level FPGA Design of Deep Learning Hyperspectral Anomaly Detection
    Boyle, Samuel
    Gunderson, Aksel
    Orlandic, Milica
    2023 IEEE NORDIC CIRCUITS AND SYSTEMS CONFERENCE, NORCAS, 2023,
  • [6] Blind Transmitter Localization Using Deep Learning: A Scalability Study
    Bizon, Ivo
    Nimr, Ahmad
    Schulz, Philipp
    Chafii, Marwa
    Fettweis, Gerhard P.
    2023 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE, WCNC, 2023,
  • [7] Performance Modeling and Scalability Optimization of Distributed Deep Learning Systems
    Yan, Feng
    Ruwase, Olatunji
    He, Yuxiong
    Chilimbi, Trishul
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1355 - 1364
  • [8] FPGA Design for PCANet Deep Learning Network
    Zhou, Yuteng
    Wang, Wei
    Huang, Xinming
    2015 IEEE 23RD ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM), 2015, : 232 - 232
  • [9] Deep Learning Binary Neural Network on an FPGA
    Zhou, Yuteng
    Redkar, Shrutika
    Huang, Xinming
    2017 IEEE 60TH INTERNATIONAL MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS (MWSCAS), 2017, : 281 - 284
  • [10] Optimizing Deep Learning Decoders for FPGA Implementation
    Kavvousanos, E.
    Paliouras, V
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 271 - 272