Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture

被引：19

作者：

Wang, Yi ^{[1
]}

Chen, Weixuan ^{[2
]}

Yang, Jing ^{[3
]}

Li, Tao ^{[4
]}

机构：

[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Natl Engn Lab Big Data Syst Comp Technol, Shenzhen 518060, Peoples R China

[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China

[3] Harbin Inst Technol, Expt & Innovat Practice Ctr, Shenzhen 518055, Peoples R China

[4] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2019年 / 30卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Near-data processing; neuromorphic computing; scheduling; memory management; parallel computing;

D O I：

10.1109/TPDS.2018.2868062

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Deep convolutional neural networks (CNNs) are widely adopted in intelligent systems with unprecedented accuracy but at the cost of a substantial amount of data movement. Although the emerging processing-in-memory (PIM) architecture seeks to minimize data movement by placing memory near processing elements, memory is still the major bottleneck in the entire system. The selection of hyper-parameters in the training of CNN applications requires over hundreds of kilobytes cache capacity for concurrent processing of convolutions. How to jointly explore the computation capability of the PIM architecture and the highly parallel property of neural networks remains a critical issue. This paper presents Para-Net, that exploits Parallelism for deterministic convolutional neural Networks on the PIM architecture. Para-Net achieves data-level parallelism for convolutions by fully utilizing the on-chip processing engine (PE) in PIM. The objective is to capture the characteristics of neural networks and present a hardware-independent design to jointly optimize the scheduling of both intermediate results and computation tasks. We formulate this data allocation problem as a dynamic programming model and obtain an optimal solution. To demonstrate the viability of the proposed Para-Net, we conduct a set of experiments using a variety of realistic CNN applications. The graph abstractions are obtained from deep learning framework Caffe. Experimental results show that Para-Net can significantly reduce processing time and improve cache efficiency compared to representative schemes.

引用

页码：589 / 600

页数：12

共 50 条

[1] Exploiting Parallelism for Convolutional Connections in Processing-In-Memory Architecture
Wang, Yi
Zhang, Mingxu
Yang, Jing
[J]. PROCEEDINGS OF THE 2017 54TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2017,
[2] Reconfigurable Processing-in-Memory Architecture for Data Intensive Applications
Bavikadi, Sathwika
Sutradhar, Purab Ranjan
Ganguly, Amlan
Dinakarrao, Sai Manoj Pudukotai
[J]. PROCEEDINGS OF THE 37TH INTERNATIONAL CONFERENCE ON VLSI DESIGN, VLSID 2024 AND 23RD INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, ES 2024, 2024, : 222 - 227
[3] Processing-in-Memory Enabled Graphics Processors for 3D Rendering
Xie, Chenhao
Song, Shuaiwen Leon
Wang, Jing
Zhang, Weigong
Fu, Xin
[J]. 2017 23RD IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2017, : 637 - 648
[4] Heterogeneous Memory Architecture Accommodating Processing-In-Memory on SoC For AIoT Applications
Qiu, Kangyi
Zhang, Yaojun
Yan, Bonan
Huang, Ru
[J]. 27TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2022, 2022, : 383 - 388
[5] An Efficient GCNs Accelerator Using 3D-Stacked Processing-in-Memory Architectures
Wang, Runze
Hu, Ao
Zheng, Long
Wang, Qinggang
Yuan, Jingrui
Liu, Haifeng
Yu, Linchen
Liao, Xiaofei
Jin, Hai
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2024, 43 (05) : 1360 - 1373
[6] Exploiting GPU with 3D Stacked Memory to Boost Performance for Data-Intensive Applications
Wen, Hao
Zhang, Wei
[J]. 2018 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2018,
[7] Exploring Stacked Main Memory Architecture for 3D GPGPUs
Zhang, Yuang
Li, Li
Jantsch, Axel
Lu, Zhonghai
Gao, Minglun
Fu, Yuxiang
Pan, Hongbing
[J]. PROCEEDINGS OF 2015 IEEE 11TH INTERNATIONAL CONFERENCE ON ASIC (ASICON), 2015,
[8] Invited: Efficient System Architecture in the Era of Monolithic 3D: Dynamic Inter-tier Interconnect and Processing-in-Memory
Stow, Dylan
Akgun, Itir
Huangfu, Wenqin
Xie, Yuan
Li, Xueqi
Loh, Gabriel H.
[J]. PROCEEDINGS OF THE 2019 56TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2019,
[9] PIMCH: Cooperative Memory Prefetching in Processing-In-Memory Architecture
Xui, Sheng
Wang, Ying
Han, Yinhe
Li, Xiaowei
[J]. 2018 23RD ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC), 2018, : 209 - 214
[10] Genetic Algorithm-Based Energy-Aware CNN Quantization for Processing-In-Memory Architecture
Kang, Beomseok
Lu, Anni
Long, Yun
Kim, Daehyun
Yu, Shimeng
Mukhopadhyay, Saibal
[J]. IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2021, 11 (04) : 649 - 662

← 1 2 3 4 5 →