Performance-Aware Approximation of Global Channel Pruning for Multitask CNNs

被引:15
|
作者
Ye, Hancheng [1 ]
Zhang, Bo [3 ]
Chen, Tao [1 ]
Fan, Jiayuan [2 ]
Wang, Bin [1 ]
机构
[1] Fudan Univ, Sch Informat Sci & Technol, Shanghai 200433, Peoples R China
[2] Fudan Univ, Acad Engn & Technol, Shanghai 200433, Peoples R China
[3] Shanghai AI Lab, Shanghai 200232, Peoples R China
基金
中国国家自然科学基金;
关键词
Channel pruning; multitask learning; performance-aware oracle criterion; sequentially greedy algorithm; DEEP; MODELS;
D O I
10.1109/TPAMI.2023.3260903
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Global channel pruning (GCP) aims to remove a subset of channels (filters) across different layers from a deep model without hurting the performance. Previous works focus on either single task model pruning or simply adapting it to multitask scenario, and still face the following problems when handling multitask pruning: 1) Due to the task mismatch, a well-pruned backbone for classification task focuses on preserving filters that can extract category-sensitive information, causing filters that may be useful for other tasks to be pruned during the backbone pruning stage; 2) For multitask predictions, different filters within or between layers are more closely related and interacted than that for single task prediction, making multitask pruning more difficult. Therefore, aiming at multitask model compression, we propose a Performance-Aware Global Channel Pruning (PAGCP) framework. We first theoretically present the objective for achieving superior GCP, by considering the joint saliency of filters from intra- and inter-layers. Then a sequentially greedy pruning strategy is proposed to optimize the objective, where a performance-aware oracle criterion is developed to evaluate sensitivity of filters to each task and preserve the globally most task-related filters. Experiments on several multitask datasets show that the proposed PAGCP can reduce the FLOPs and parameters by over 60% with minor performance drop, and achieves 1.2x similar to 3.3x acceleration on both cloud and mobile platforms. Our code is available at http://www.github.com/HankYe/PAGCP.git.
引用
收藏
页码:10267 / 10284
页数:18
相关论文
共 50 条
  • [1] Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning
    Rumi, Masuma Akter
    Ma, Xiaolong
    Wang, Yanzhi
    Jiang, Peng
    [J]. PACT '20: PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES, 2020, : 267 - 278
  • [2] Performance-Aware Multicore Programming
    Lo, Chia-Tien Dan
    [J]. PROCEEDINGS OF THE 49TH ANNUAL ASSOCIATION FOR COMPUTING MACHINERY SOUTHEAST CONFERENCE (ACMSE '11), 2011, : 126 - 131
  • [3] Pruning-Aware Merging for Efficient Multitask Inference
    He, Xiaoxi
    Gao, Dawei
    Zhou, Zimu
    Tong, Yongxin
    Thiele, Lothar
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 585 - 595
  • [4] Performance-aware load balancing for multiclusters
    He, LG
    Jarvis, SA
    Bacigalupo, D
    Spooner, DP
    Nudd, GR
    [J]. PARALLEL AND DISTRIBUTED PROCESSING AND APPLICATIONS, PROCEEDINGS, 2004, 3358 : 635 - 647
  • [5] CHaPR: Efficient Inference of CNNs via Channel Pruning
    Zhang, Boyu
    Davoodi, Azadeh
    Hu, Yu Hen
    [J]. 2020 INTERNATIONAL CONFERENCE ON OMNI-LAYER INTELLIGENT SYSTEMS (IEEE COINS 2020), 2020, : 182 - 187
  • [6] Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs
    Radu, Valentin
    Kaszyk, Kuba
    Wen, Yuan
    Turner, Jack
    Cano, Jose
    Crowley, Elliot J.
    Franke, Bjoern
    Storkey, Amos
    O'Boyle, Michael
    [J]. PROCEEDINGS OF THE 2019 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION (IISWC 2019), 2019, : 24 - 34
  • [7] ReaLPrune: ReRAM Crossbar-Aware Lottery Ticket Pruning for CNNs
    Joardar, Biresh Kumar
    Doppa, Janardhan Rao
    Li, Hai
    Chakrabarty, Krishnendu
    Pande, Partha Pratim
    [J]. IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2023, 11 (02) : 303 - 317
  • [8] Performance-Aware Reliability Assessment of Heterogeneous Chips
    Chatzidimitriou, Athanasios
    Kaliorakis, Manolis
    Tselonis, Sotiris
    Gizopoulos, Dimitris
    [J]. 2017 IEEE 35TH VLSI TEST SYMPOSIUM (VTS), 2017,
  • [9] Performance-Aware Interconnect Delay Insertion Against EM Side-Channel Attacks
    Jiang, Minmin
    Pavlidis, Vasilis F.
    [J]. 2021 ACM/IEEE INTERNATIONAL WORKSHOP ON SYSTEM-LEVEL INTERCONNECT PATHFINDING (SLIP 2021), 2021, : 25 - 32
  • [10] An Automated Performance-Aware Approach to Reliability Transformations
    Lidman, Jacob
    McKee, Sally A.
    Quinlan, Daniel J.
    Liao, Chunhua
    [J]. EURO-PAR 2014: PARALLEL PROCESSING WORKSHOPS, PT I, 2014, 8805 : 523 - 534