Diffused Redundancy in Pre-trained Representations

被引:0
|
作者
Nanda, Vedant [1 ,2 ]
Speicher, Till [2 ]
Dickerson, John P. [1 ]
Gummadi, Krishna P. [2 ]
Feizi, Soheil [1 ]
Weller, Adrian [3 ,4 ]
机构
[1] Univ Maryland, College Pk, MD 20742 USA
[2] MPI SWS, Saarbrucken, Germany
[3] Alan Turing Inst, London, England
[4] Univ Cambridge, Cambridge, England
关键词
NEURAL-NETWORKS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Representations learned by pre-training a neural network on a large dataset are increasingly used successfully to perform a variety of downstream tasks. In this work, we take a closer look at how features are encoded in such pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy, i.e., any randomly chosen subset of neurons in the layer that is larger than a threshold size shares a large degree of similarity with the full layer and is able to perform similarly as the whole layer on a variety of downstream tasks. For example, a linear probe trained on 20% of randomly picked neurons from the penultimate layer of a ResNet50 pre-trained on ImageNet1k achieves an accuracy within 5% of a linear probe trained on the full layer of neurons for downstream CIFAR10 classification. We conduct experiments on different neural architectures (including CNNs and Transformers) pretrained on both ImageNet1k and ImageNet21k and evaluate a variety of down-stream tasks taken from the VTAB benchmark. We find that the loss & dataset used during pre-training largely govern the degree of diffuse redundancy and the "critical mass" of neurons needed often depends on the downstream task, suggesting that there is a task-inherent redundancy-performance Pareto frontier. Our findings shed light on the nature of representations learned by pre-trained deep neural networks and suggest that entire layers might not be necessary to perform many downstream tasks. We investigate the potential for exploiting this redundancy to achieve efficient generalization for downstream tasks and also draw caution to certain possible unintended consequences. Our code is available at https://github.com/nvedant07/diffused-redundancy.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Pre-trained Affective Word Representations
    Chawla, Kushal
    Khosla, Sopan
    Chhaya, Niyati
    Jaidka, Kokil
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [2] On the Language Neutrality of Pre-trained Multilingual Representations
    Libovicky, Jindrich
    Rosa, Rudolf
    Fraser, Alexander
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1663 - 1674
  • [3] Imparting Fairness to Pre-Trained Biased Representations
    Sadeghi, Bashir
    Boddeti, Vishnu Naresh
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 75 - 82
  • [4] Inverse Problems Leveraging Pre-trained Contrastive Representations
    Ravula, Sriram
    Smyrnis, Georgios
    Jordan, Matt
    Dimakis, Alexandros G.
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [5] Pre-trained Language Model Representations for Language Generation
    Edunov, Sergey
    Baevski, Alexei
    Auli, Michael
    [J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
  • [6] Assessing Multilingual Fairness in Pre-trained Multimodal Representations
    Wang, Jialu
    Liu, Yang
    Wang, Xin Eric
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2681 - 2695
  • [7] Are Pre-trained Convolutions Better than Pre-trained Transformers?
    Tay, Yi
    Dehghani, Mostafa
    Gupta, Jai
    Aribandi, Vamsi
    Bahri, Dara
    Qin, Zhen
    Metzler, Donald
    [J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4349 - 4359
  • [8] Aspect Based Sentiment Analysis by Pre-trained Language Representations
    Liang Tianxin
    Yang Xiaoping
    Zhou Xibo
    Wang Bingqian
    [J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1262 - 1265
  • [9] NewsEmbed: Modeling News through Pre-trained Document Representations
    Liu, Jialu
    Liu, Tianqi
    Yu, Cong
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 1076 - 1086
  • [10] Pre-Trained Models Based Receiver Design With Natural Redundancy for Chinese Characters
    Wang, Zhen-Yu
    Yu, Hong-Yi
    Shen, Cai-Yao
    Zhu, Zhao-Rui
    Shen, Zhi-Xiang
    Du, Jian-Ping
    [J]. IEEE COMMUNICATIONS LETTERS, 2022, 26 (10) : 2350 - 2354