Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture

被引:19
|
作者
Wang, Yi [1 ]
Chen, Weixuan [2 ]
Yang, Jing [3 ]
Li, Tao [4 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[3] Harbin Inst Technol, Expt & Innovat Practice Ctr, Shenzhen 518055, Peoples R China
[4] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
基金
中国国家自然科学基金;
关键词
Near-data processing; neuromorphic computing; scheduling; memory management; parallel computing;
D O I
10.1109/TPDS.2018.2868062
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deep convolutional neural networks (CNNs) are widely adopted in intelligent systems with unprecedented accuracy but at the cost of a substantial amount of data movement. Although the emerging processing-in-memory (PIM) architecture seeks to minimize data movement by placing memory near processing elements, memory is still the major bottleneck in the entire system. The selection of hyper-parameters in the training of CNN applications requires over hundreds of kilobytes cache capacity for concurrent processing of convolutions. How to jointly explore the computation capability of the PIM architecture and the highly parallel property of neural networks remains a critical issue. This paper presents Para-Net, that exploits Parallelism for deterministic convolutional neural Networks on the PIM architecture. Para-Net achieves data-level parallelism for convolutions by fully utilizing the on-chip processing engine (PE) in PIM. The objective is to capture the characteristics of neural networks and present a hardware-independent design to jointly optimize the scheduling of both intermediate results and computation tasks. We formulate this data allocation problem as a dynamic programming model and obtain an optimal solution. To demonstrate the viability of the proposed Para-Net, we conduct a set of experiments using a variety of realistic CNN applications. The graph abstractions are obtained from deep learning framework Caffe. Experimental results show that Para-Net can significantly reduce processing time and improve cache efficiency compared to representative schemes.
引用
收藏
页码:589 / 600
页数:12
相关论文
共 50 条
  • [1] Exploiting Parallelism for Convolutional Connections in Processing-In-Memory Architecture
    Wang, Yi
    Zhang, Mingxu
    Yang, Jing
    [J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
  • [2] Reconfigurable Processing-in-Memory Architecture for Data Intensive Applications
    Bavikadi, Sathwika
    Sutradhar, Purab Ranjan
    Ganguly, Amlan
    Dinakarrao, Sai Manoj Pudukotai
    [J]. PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, VLSID 2024 AND 23RD INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, ES 2024, 2024, : 222 - 227
  • [3] Processing-in-Memory Enabled Graphics Processors for 3D Rendering
    Xie, Chenhao
    Song, Shuaiwen Leon
    Wang, Jing
    Zhang, Weigong
    Fu, Xin
    [J]. 2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, : 637 - 648
  • [4] Heterogeneous Memory Architecture Accommodating Processing-In-Memory on SoC For AIoT Applications
    Qiu, Kangyi
    Zhang, Yaojun
    Yan, Bonan
    Huang, Ru
    [J]. 27TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2022, 2022, : 383 - 388
  • [5] An Efficient GCNs Accelerator Using 3D-Stacked Processing-in-Memory Architectures
    Wang, Runze
    Hu, Ao
    Zheng, Long
    Wang, Qinggang
    Yuan, Jingrui
    Liu, Haifeng
    Yu, Linchen
    Liao, Xiaofei
    Jin, Hai
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (05) : 1360 - 1373
  • [6] Exploiting GPU with 3D Stacked Memory to Boost Performance for Data-Intensive Applications
    Wen, Hao
    Zhang, Wei
    [J]. 2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
  • [7] Exploring Stacked Main Memory Architecture for 3D GPGPUs
    Zhang, Yuang
    Li, Li
    Jantsch, Axel
    Lu, Zhonghai
    Gao, Minglun
    Fu, Yuxiang
    Pan, Hongbing
    [J]. PROCEEDINGS OF 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2015,
  • [8] Invited: Efficient System Architecture in the Era of Monolithic 3D: Dynamic Inter-tier Interconnect and Processing-in-Memory
    Stow, Dylan
    Akgun, Itir
    Huangfu, Wenqin
    Xie, Yuan
    Li, Xueqi
    Loh, Gabriel H.
    [J]. PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
  • [9] PIMCH: Cooperative Memory Prefetching in Processing-In-Memory Architecture
    Xui, Sheng
    Wang, Ying
    Han, Yinhe
    Li, Xiaowei
    [J]. 2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2018, : 209 - 214
  • [10] Genetic Algorithm-Based Energy-Aware CNN Quantization for Processing-In-Memory Architecture
    Kang, Beomseok
    Lu, Anni
    Long, Yun
    Kim, Daehyun
    Yu, Shimeng
    Mukhopadhyay, Saibal
    [J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2021, 11 (04) : 649 - 662