GrOD: Deep Learning with Gradients Orthogonal Decomposition for Knowledge Transfer, Distillation, and Adversarial Training

被引:7
|
作者
Xiong, Haoyi [1 ]
Wan, Ruosi [1 ,2 ]
Zhao, Jian [3 ]
Chen, Zeyu [1 ]
Li, Xingjian [1 ]
Zhu, Zhanxing [2 ]
Huan, Jun [1 ]
机构
[1] Baidu Inc, Baidu Technol Pk, Beijing 100085, Peoples R China
[2] Peking Univ, Sch Math Sci, Beijing 100871, Peoples R China
[3] Inst North Elect Equipment, Beijing, Peoples R China
基金
国家重点研发计划; 美国国家科学基金会;
关键词
Deep neural networks; regularized deep learning; gradient-based learning;
D O I
10.1145/3530836
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Regularization that incorporates the linear combination of empirical loss and explicit regularization terms as the loss function has been frequently used for many machine learning tasks. The explicit regularization term is designed in different types, depending on its applications. While regularized learning often boost the performance with higher accuracy and faster convergence, the regularization would sometimes hurt the empirical loss minimization and lead to poor performance. To deal with such issues in this work, we propose a novel strategy, namely Gradients Orthogonal Decomposition (GrOD), that improves the training procedure of regularized deep learning. Instead of linearly combining gradients of the two terms, GrOD re-estimates a new direction for iteration that does not hurt the empirical loss minimization while preserving the regularization affects, through orthogonal decomposition. We have performed extensive experiments to use GrOD improving the commonly used algorithms of transfer learning [2], knowledge distillation [3], and adversarial learning [4]. The experiment results based on large datasets, including Caltech 256 [5], MIT indoor 67 [6], CIFAR-10 [7], and ImageNet [8], show significant improvement made by GrOD for all three algorithms in all cases.
引用
下载
收藏
页数:25
相关论文
共 50 条
  • [1] Light Deep Face Recognition based on Knowledge Distillation and Adversarial Training
    Liu, Jinjin
    Li, Xiaonan
    2022 INTERNATIONAL CONFERENCE ON MECHANICAL, AUTOMATION AND ELECTRICAL ENGINEERING, CMAEE, 2022, : 127 - 132
  • [2] Topology-guided Adversarial Deep Mutual Learning for Knowledge Distillation
    Lai X.
    Qu Y.-Y.
    Xie Y.
    Pei Y.-L.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (01): : 102 - 110
  • [3] Knowledge Distillation with Attention for Deep Transfer Learning of Convolutional Networks
    Li, Xingjian
    Xiong, Haoyi
    Chen, Zeyu
    Huan, Jun
    Liu, Ji
    Xu, Cheng-Zhong
    Dou, Dejing
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2022, 16 (03)
  • [4] Melanoma detection using adversarial training and deep transfer learning
    Zunair, Hasib
    Ben Hamza, A.
    PHYSICS IN MEDICINE AND BIOLOGY, 2020, 65 (13):
  • [5] A continual learning framework to train robust image recognition models by adversarial training and knowledge distillation
    Chou, Ting-Chun
    Kuo, Yu-Cheng
    Huang, Jhih-Yuan
    Lee, Wei-Po
    CONNECTION SCIENCE, 2024, 36 (01)
  • [6] A Survey of Knowledge Distillation in Deep Learning
    Shao R.-R.
    Liu Y.-A.
    Zhang W.
    Wang J.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (08): : 1638 - 1673
  • [7] Brain Tumor Segmentation based on Knowledge Distillation and Adversarial Training
    Hou, Yaqing
    Li, Tianbo
    Zhang, Qiang
    Yu, Hua
    Ge, Hongwei
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [8] GopGAN: Gradients Orthogonal Projection Generative Adversarial Network With Continual Learning
    Li, Xiaobin
    Wang, Weiqiang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 215 - 227
  • [9] Subsampling and Knowledge Distillation on Adversarial Examples: New Techniques for Deep Learning Based Side Channel Evaluations
    Gohr, Aron
    Jacob, Sven
    Schindler, Werner
    SELECTED AREAS IN CRYPTOGRAPHY, 2021, 12804 : 567 - 592
  • [10] Knowledge distillation in deep learning and its applications
    Alkhulaifi, Abdolmaged
    Alsahli, Fahad
    Ahmad, Irfan
    PEERJ COMPUTER SCIENCE, 2021, PeerJ Inc. (07) : 1 - 24