Partition Pruning: Parallelization-Aware Pruning for Dense Neural Networks

被引：0

作者：

Shahhosseini, Sina ^{[1
]}

Albaqsami, Ahmad ^{[1
]}

Jasemi, Masoomeh ^{[1
,2
]}

Bagherzadeh, Nader ^{[1
]}

机构：

[1] Univ Calif Irvine, Elect Engn & Comp Sci Dept, Irvine, CA 92697 USA

[2] Sharif Univ Technol, Dept Comp Engn, Tehran, Iran

来源：

2020 28TH EUROMICRO INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND NETWORK-BASED PROCESSING (PDP 2020) | 2020年

关键词：

Parallelization; Deep Neural Network; Pruning; Partitioning; Hardware Accelerator;

D O I：

10.1109/PDP50117.2020.00053

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As recent neural networks are being improved to be more accurate, their model's size is exponentially growing. Thus, a huge number of parameters requires to be loaded and stored from/in memory hierarchy and computed in processors to perform training or inference phase of neural network processing. Increasing the number of parameters causes a big challenge for real-time deployment since the memory bandwidth improvement's trend cannot keep up with models' complexity growing trend. Although some operations in neural networks processing are computational intensive such as convolutional layer computing, computing dense layers face with memory bandwidth bottleneck. To address the issue, the paper has proposed Partition Pruning for dense layers to reduce the required parameters while taking into consideration parallelization. We evaluated the performance and energy consumption of parallel inference of partitioned models, which showed a 7.72x speedup of performance and a 2.73x reduction in the energy used for computing pruned fully connected layers in TinyVGG16 model in comparison to running the unpruned model on a single accelerator. Besides, our method showed a limited reduction in accuracy while partitioning fully connected layers.

引用

下载

页码：307 / 311

页数：5

共 50 条

[31] FPGA Resource-aware Structured Pruning for Real-Time Neural Networks
Ramhorst, Benjamin
Loncar, Vladimir
Constantinides, George A.
2023 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY, ICFPT, 2023, : 282 - 283
[32] CHAMP: Coherent Hardware-Aware Magnitude Pruning of Integrated Photonic Neural Networks
Banerjee, Sanmitra
Nikdast, Mahdi
Pasricha, Sudeep
Chakrabarty, Krishnendu
2022 OPTICAL FIBER COMMUNICATIONS CONFERENCE AND EXHIBITION (OFC), 2022,
[33] A GPU Architecture Aware Fine-Grain Pruning Technique for Deep Neural Networks
Choi, Kyusik
Yang, Hoeseok
EURO-PAR 2021: PARALLEL PROCESSING, 2021, 12820 : 217 - 231
[34] Resource-Aware Saliency-Guided Differentiable Pruning for Deep Neural Networks
Kallakuri, Uttej
Humes, Edward
Mohsenin, Tinoosh
PROCEEDING OF THE GREAT LAKES SYMPOSIUM ON VLSI 2024, GLSVLSI 2024, 2024, : 694 - 699
[35] FRACTIONAL STEP DISCRIMINANT PRUNING: A FILTER PRUNING FRAMEWORK FOR DEEP CONVOLUTIONAL NEURAL NETWORKS
Gkalelis, Nikolaos
Mezaris, Vasileios
2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2020,
[36] Iterative clustering pruning for convolutional neural networks
Chang, Jingfei
Lu, Yang
Xue, Ping
Xu, Yiqun
Wei, Zhen
KNOWLEDGE-BASED SYSTEMS, 2023, 265
[37] Leveraging Structured Pruning of Convolutional Neural Networks
Tessier, Hugo
Gripon, Vincent
Leonardon, Mathieu
Arzel, Matthieu
Bertrand, David
Hannagan, Thomas
2022 IEEE WORKSHOP ON SIGNAL PROCESSING SYSTEMS (SIPS), 2022, : 174 - 179
[38] On rule pruning using fuzzy neural networks
Department of Computer Science, Regional Engineering College, Durgapur, W.B., India
Fuzzy Sets Syst, 3 (335-347):
[39] Magnitude and Uncertainty Pruning Criterion for Neural Networks
Ko, Vinnie
Oehmcke, Stefan
Gieseke, Fabian
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2317 - 2326
[40] Flattening Layer Pruning in Convolutional Neural Networks
Jeczmionek, Ernest
Kowalski, Piotr A.
SYMMETRY-BASEL, 2021, 13 (07):

← 1 2 3 4 5 →