Exploring Resource-Efficient Acceleration Algorithm for Transposed Convolution of GANs on FPGA

被引：6

作者：

Di, Xinkai ^{[1
,2
]}

Yang, Haigang ^{[1
,2
]}

Huang, Zhihong ^{[1
,2
]}

Mao, Ning ^{[1
,2
]}

Jia, Yiping ^{[1
,2
]}

Zheng, Yong ^{[1
,2
]}

机构：

[1] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[2] Chinese Acad Sci, Inst Elect, Beijing 100190, Peoples R China

来源：

2019 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT 2019) | 2019年

基金：

中国国家自然科学基金;

关键词：

Generative Adversarial Networks (GANs); Transposed Convolution; FPGA; Winograd; Hardware Accelerator Architecture;

D O I：

10.1109/ICFPT47387.2019.00011

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

In recent years, Generative Adversarial Networks (GANs) have been widely adopted for computer vision tasks such as generation/synthesis of massive images and 3D object modeling. The hardware acceleration of Transposed Convolution layers is especially essential since the Generative Model (Generator) as a critical component in GANs is computationally intensive in nature. In transposed Convolution, the zeros-inserting preprocessing causes sparsity of the feature maps and further results in many invalid operations. Most of the existing FPGA architectures cannot effectively tackle this issue. To address the challenges of implementing Transposed Convolution on FPGAs, we present an innovative dataflow design approach by applying the Winograd algorithm for fast processing with a high efficiency in terms of resource allocations. In addition, we propose an underlying Hardware Accelerator Architecture that features having PUs embedded in Parallel, Pipelined, and Buffered processing flow. In this paper, a parallelism-aware Memory Partition scheme is also exploded for bandwidth efficient data access. Implementations of several state-of-the-art GANs by our approach achieves an average performance of 639.2 GOPS on Xilinx ZCU102 FPGA device. In reference to an optimized conventional accelerator baseline, this work demonstrates an 8.6x (up to 11.7x) improvement in processing performance, compared to below 2.2x improvement by the other works in literature.

引用

页码：19 / 27

页数：9