Diffused Redundancy in Pre-trained Representations

被引:0
|
作者
Nanda, Vedant [1 ,2 ]
Speicher, Till [2 ]
Dickerson, John P. [1 ]
Gummadi, Krishna P. [2 ]
Feizi, Soheil [1 ]
Weller, Adrian [3 ,4 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] MPI SWS, Saarbrucken, Germany
[3] Alan Turing Inst, London, England
[4] Univ Cambridge, Cambridge, England
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Representations learned by pre-training a neural network on a large dataset are increasingly used successfully to perform a variety of downstream tasks. In this work, we take a closer look at how features are encoded in such pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy, i.e., any randomly chosen subset of neurons in the layer that is larger than a threshold size shares a large degree of similarity with the full layer and is able to perform similarly as the whole layer on a variety of downstream tasks. For example, a linear probe trained on 20% of randomly picked neurons from the penultimate layer of a ResNet50 pre-trained on ImageNet1k achieves an accuracy within 5% of a linear probe trained on the full layer of neurons for downstream CIFAR10 classification. We conduct experiments on different neural architectures (including CNNs and Transformers) pretrained on both ImageNet1k and ImageNet21k and evaluate a variety of down-stream tasks taken from the VTAB benchmark. We find that the loss & dataset used during pre-training largely govern the degree of diffuse redundancy and the "critical mass" of neurons needed often depends on the downstream task, suggesting that there is a task-inherent redundancy-performance Pareto frontier. Our findings shed light on the nature of representations learned by pre-trained deep neural networks and suggest that entire layers might not be necessary to perform many downstream tasks. We investigate the potential for exploiting this redundancy to achieve efficient generalization for downstream tasks and also draw caution to certain possible unintended consequences. Our code is available at https://github.com/nvedant07/diffused-redundancy.
引用
收藏
页数:25
相关论文
共 50 条
  • [31] Pre-trained transformers: an empirical comparison
    Casola, Silvia
    Lauriola, Ivano
    Lavelli, Alberto
    [J]. MACHINE LEARNING WITH APPLICATIONS, 2022, 9
  • [32] Implicit Stereotypes in Pre-Trained Classifiers
    Dehouche, Nassim
    [J]. IEEE ACCESS, 2021, 9 : 167936 - 167947
  • [33] Detecting Backdoors in Pre-trained Encoders
    Feng, Shiwei
    Tao, Guanhong
    Cheng, Siyuan
    Shen, Guangyu
    Xu, Xiangzhe
    Liu, Yingqi
    Zhang, Kaiyuan
    Ma, Shiqing
    Zhang, Xiangyu
    [J]. arXiv, 2023,
  • [34] Pre-Trained Image Processing Transformer
    Chen, Hanting
    Wang, Yunhe
    Guo, Tianyu
    Xu, Chang
    Deng, Yiping
    Liu, Zhenhua
    Ma, Siwei
    Xu, Chunjing
    Xu, Chao
    Gao, Wen
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305
  • [35] Efficiently Robustify Pre-Trained Models
    Jain, Nishant
    Behl, Harkirat
    Rawat, Yogesh Singh
    Vineet, Vibhav
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5482 - 5492
  • [36] Pre-trained Models for Sonar Images
    Valdenegro-Toro, Matias
    Preciado-Grijalva, Alan
    Wehbe, Bilal
    [J]. OCEANS 2021: SAN DIEGO - PORTO, 2021,
  • [37] Debiasing Pre-trained Contextualised Embeddings
    Kaneko, Masahiro
    Bollegala, Danushka
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1256 - 1266
  • [38] Detecting Backdoors in Pre-trained Encoders
    Feng, Shiwei
    Tao, Guanhong
    Cheng, Siyuan
    Shen, Guangyu
    Xu, Xiangzhe
    Liu, Yingqi
    Zhang, Kaiyuan
    Ma, Shiqing
    Zhang, Xiangyu
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 16352 - 16362
  • [39] Hyperbolic Pre-Trained Language Model
    Chen, Weize
    Han, Xu
    Lin, Yankai
    He, Kaichen
    Xie, Ruobing
    Zhou, Jie
    Liu, Zhiyuan
    Sun, Maosong
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
  • [40] USING PRE-TRAINED TEMPORARY HELP
    ZITO, JM
    [J]. TRAINING AND DEVELOPMENT JOURNAL, 1968, 22 (09): : 24 - &