An Energy-Efficient GAN Accelerator With On-Chip Training for Domain-Specific Optimization

被引:3
|
作者
Kim, Soyeon [1 ]
Kang, Sanghoon [1 ]
Han, Donghyeon [1 ]
Kim, Sangjin [1 ]
Kim, Sangyeob [1 ]
Yoo, Hoi-Jun [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Daejeon 34141, South Korea
关键词
Deep learning; generative adversarial network (GAN); instance normalization (IN); local learning;
D O I
10.1109/JSSC.2021.3094469
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Generative adversarial networks (GANs) consist of multiple deep neural networks cooperating and competing with each other. Due to their complex architectures and large feature map sizes, training GANs requires a huge amount of computations. Moreover, instance normalization (IN) layers in GANs dramatically increase the external memory access (EMA). However, retraining GANs with user-specific data is critical on mobile devices because the pre-trained model outputs distorted images under user-specific conditions. This article proposes a GAN training accelerator to enable energy-efficient domain-specific optimization of GAN with user's local data. Selective layer retraining (SELRET) picks out layers that are effective in enhancing the quality of the retrained model. Without image quality degradation, the SELRET reduces the required computation by 69%. Moreover, reordering layers for instance normalization (ROLIN) is proposed to reduce the EMA of intermediate data. Through the implementation of the proposed architecture, which splits and reorders the IN layers, 38.7% and 32.2% of overall EMA reduction are achieved in the forward propagation (FP) stage and the error propagation (EP) stage, respectively. The proposed processor is fabricated in a 65-nm CMOS process, showing 0.38-TFLOPS/W energy efficiency. The chip can retrain a face modification GAN with a custom dataset of 256 x 256 images over 100 epochs under 30 s while only consuming 274 mW. Compared to the previous FPGA implementation, this work improved the retraining performance and energy efficiency by 2x and 39x, respectively. As a result, the proposed accelerator enables GAN's domain-specific optimization on a mobile platform.
引用
收藏
页码:2968 / 2980
页数:13
相关论文
共 50 条
  • [21] Design of an Energy-Efficient Accelerator for Training of Convolutional Neural Networks using Frequency-Domain Computation
    Ko, Jong Hwan
    Mudassar, Burhan
    Na, Taesik
    Mukhopadhyay, Saibal
    [J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [22] An energy-efficient image filtering interpolation algorithm using domain-specific dynamic reconfigurable array processor
    Guo, Aiying
    Lin, Enlin
    Zhang, Jianhua
    Liu, Jingjing
    [J]. INTEGRATION-THE VLSI JOURNAL, 2024, 96
  • [23] Domain-Specific Computing Using FPGA Accelerator
    Watanabe, Yasuhiro
    Fujisawa, Hisanori
    Ozawa, Toshihiro
    [J]. FUJITSU SCIENTIFIC & TECHNICAL JOURNAL, 2017, 53 (05): : 20 - 25
  • [24] Energy-Efficient Transceiver Circuits for Short-Range On-chip Interconnects
    Postman, Jacob
    Chiang, Patrick
    [J]. 2011 IEEE CUSTOM INTEGRATED CIRCUITS CONFERENCE (CICC), 2011,
  • [25] EnAAM: Energy-Efficient Anti-Aging for On-Chip Video Memories
    Shafique, Muhammad
    Khan, Muhammad Usman Karim
    Tuefek, Orcun
    Henkel, Joerg
    [J]. 2015 52ND ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2015,
  • [26] Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks
    Udipi, Aniruddha N.
    Muralimanohar, Naveen
    Balasubramonian, Rajeev
    [J]. HPCA-16 2010: SIXTEENTH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, PROCEEDINGS, 2010, : 247 - +
  • [27] Energy-Efficient eDRAM-Based On-Chip Storage Architecture for GPGPUs
    Jing, Naifeng
    Jiang, Li
    Zhang, Tao
    Li, Chao
    Fan, Fengfeng
    Liang, Xiaoyao
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2016, 65 (01) : 122 - 135
  • [28] Exploiting variable cycle transmission energy-efficient on-chip interconnect design
    Kalyan, T. Venkata
    Mutyam, Madhu
    Rao, P. Vijaya Sankara
    [J]. 21ST INTERNATIONAL CONFERENCE ON VLSI DESIGN: HELD JOINTLY WITH THE 7TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, PROCEEDINGS, 2008, : 235 - +
  • [29] A gracefully degrading and energy-efficient modular router architecture for on-chip networks
    Kim, Jongman
    Nicopoulos, Chrysostomos
    Park, Dongkook
    Naravanan, Vijaykrishnan
    Youssif, Mazin S.
    Das, Chita R.
    [J]. 33RD INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHTIECTURE, PROCEEDINGS, 2006, : 4 - 15
  • [30] An Energy-efficient On-chip Learning Architecture for STDP based Sparse Coding
    Kim, Heetak
    Tang, Hoyoung
    Park, Jongsun
    [J]. 2019 IEEE/ACM INTERNATIONAL SYMPOSIUM ON LOW POWER ELECTRONICS AND DESIGN (ISLPED), 2019,