Performance-Aware Approximation of Global Channel Pruning for Multitask CNNs

被引：15

作者：

Ye, Hancheng ^{[1
]}

Zhang, Bo ^{[3
]}

Chen, Tao ^{[1
]}

Fan, Jiayuan ^{[2
]}

Wang, Bin ^{[1
]}

机构：

[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China

[2] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China

[3] Shanghai AI Lab, Shanghai 200232, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Channel pruning; multitask learning; performance-aware oracle criterion; sequentially greedy algorithm; DEEP; MODELS;

D O I：

10.1109/TPAMI.2023.3260903

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Global channel pruning (GCP) aims to remove a subset of channels (filters) across different layers from a deep model without hurting the performance. Previous works focus on either single task model pruning or simply adapting it to multitask scenario, and still face the following problems when handling multitask pruning: 1) Due to the task mismatch, a well-pruned backbone for classification task focuses on preserving filters that can extract category-sensitive information, causing filters that may be useful for other tasks to be pruned during the backbone pruning stage; 2) For multitask predictions, different filters within or between layers are more closely related and interacted than that for single task prediction, making multitask pruning more difficult. Therefore, aiming at multitask model compression, we propose a Performance-Aware Global Channel Pruning (PAGCP) framework. We first theoretically present the objective for achieving superior GCP, by considering the joint saliency of filters from intra- and inter-layers. Then a sequentially greedy pruning strategy is proposed to optimize the objective, where a performance-aware oracle criterion is developed to evaluate sensitivity of filters to each task and preserve the globally most task-related filters. Experiments on several multitask datasets show that the proposed PAGCP can reduce the FLOPs and parameters by over 60% with minor performance drop, and achieves 1.2x similar to 3.3x acceleration on both cloud and mobile platforms. Our code is available at http://www.github.com/HankYe/PAGCP.git.

引用

页码：10267 / 10284

页数：18

共 50 条

[1] Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning
Rumi, Masuma Akter
Ma, Xiaolong
Wang, Yanzhi
Jiang, Peng
[J]. PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 267 - 278
[2] Performance-Aware Multicore Programming
Lo, Chia-Tien Dan
[J]. PROCEEDINGS OF THE 49TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE (ACMSE '11), 2011, : 126 - 131
[3] Pruning-Aware Merging for Efficient Multitask Inference
He, Xiaoxi
Gao, Dawei
Zhou, Zimu
Tong, Yongxin
Thiele, Lothar
[J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 585 - 595
[4] Performance-aware load balancing for multiclusters
He, LG
Jarvis, SA
Bacigalupo, D
Spooner, DP
Nudd, GR
[J]. PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2004, 3358 : 635 - 647
[5] CHaPR: Efficient Inference of CNNs via Channel Pruning
Zhang, Boyu
Davoodi, Azadeh
Hu, Yu Hen
[J]. 2020 INTERNATIONAL CONFERENCE ON OMNI-LAYER INTELLIGENT SYSTEMS (IEEE COINS 2020), 2020, : 182 - 187
[6] Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs
Radu, Valentin
Kaszyk, Kuba
Wen, Yuan
Turner, Jack
Cano, Jose
Crowley, Elliot J.
Franke, Bjoern
Storkey, Amos
O'Boyle, Michael
[J]. PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2019), 2019, : 24 - 34
[7] ReaLPrune: ReRAM Crossbar-Aware Lottery Ticket Pruning for CNNs
Joardar, Biresh Kumar
Doppa, Janardhan Rao
Li, Hai
Chakrabarty, Krishnendu
Pande, Partha Pratim
[J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (02) : 303 - 317
[8] Performance-Aware Reliability Assessment of Heterogeneous Chips
Chatzidimitriou, Athanasios
Kaliorakis, Manolis
Tselonis, Sotiris
Gizopoulos, Dimitris
[J]. 2017 IEEE 35TH VLSI TEST SYMPOSIUM (VTS), 2017,
[9] Performance-Aware Interconnect Delay Insertion Against EM Side-Channel Attacks
Jiang, Minmin
Pavlidis, Vasilis F.
[J]. 2021 ACM/IEEE INTERNATIONAL WORKSHOP ON SYSTEM-LEVEL INTERCONNECT PATHFINDING (SLIP 2021), 2021, : 25 - 32
[10] An Automated Performance-Aware Approach to Reliability Transformations
Lidman, Jacob
McKee, Sally A.
Quinlan, Daniel J.
Liao, Chunhua
[J]. EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT I, 2014, 8805 : 523 - 534

← 1 2 3 4 5 →