Diffused Redundancy in Pre-trained Representations

被引：0

作者：

Nanda, Vedant ^{[1
,2
]}

Speicher, Till ^{[2
]}

Dickerson, John P. ^{[1
]}

Gummadi, Krishna P. ^{[2
]}

Feizi, Soheil ^{[1
]}

Weller, Adrian ^{[3
,4
]}

机构：

[1] Univ Maryland, College Pk, MD 20742 USA

[2] MPI SWS, Saarbrucken, Germany

[3] Alan Turing Inst, London, England

[4] Univ Cambridge, Cambridge, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Representations learned by pre-training a neural network on a large dataset are increasingly used successfully to perform a variety of downstream tasks. In this work, we take a closer look at how features are encoded in such pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy, i.e., any randomly chosen subset of neurons in the layer that is larger than a threshold size shares a large degree of similarity with the full layer and is able to perform similarly as the whole layer on a variety of downstream tasks. For example, a linear probe trained on 20% of randomly picked neurons from the penultimate layer of a ResNet50 pre-trained on ImageNet1k achieves an accuracy within 5% of a linear probe trained on the full layer of neurons for downstream CIFAR10 classification. We conduct experiments on different neural architectures (including CNNs and Transformers) pretrained on both ImageNet1k and ImageNet21k and evaluate a variety of down-stream tasks taken from the VTAB benchmark. We find that the loss & dataset used during pre-training largely govern the degree of diffuse redundancy and the "critical mass" of neurons needed often depends on the downstream task, suggesting that there is a task-inherent redundancy-performance Pareto frontier. Our findings shed light on the nature of representations learned by pre-trained deep neural networks and suggest that entire layers might not be necessary to perform many downstream tasks. We investigate the potential for exploiting this redundancy to achieve efficient generalization for downstream tasks and also draw caution to certain possible unintended consequences. Our code is available at https://github.com/nvedant07/diffused-redundancy.

引用

页数：25

共 50 条

[1] Pre-trained Affective Word Representations
Chawla, Kushal
Khosla, Sopan
Chhaya, Niyati
Jaidka, Kokil
[J]. 2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
[2] On the Language Neutrality of Pre-trained Multilingual Representations
Libovicky, Jindrich
Rosa, Rudolf
Fraser, Alexander
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1663 - 1674
[3] Imparting Fairness to Pre-Trained Biased Representations
Sadeghi, Bashir
Boddeti, Vishnu Naresh
[J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 75 - 82
[4] Inverse Problems Leveraging Pre-trained Contrastive Representations
Ravula, Sriram
Smyrnis, Georgios
Jordan, Matt
Dimakis, Alexandros G.
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[5] Pre-trained Language Model Representations for Language Generation
Edunov, Sergey
Baevski, Alexei
Auli, Michael
[J]. 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 4052 - 4059
[6] Assessing Multilingual Fairness in Pre-trained Multimodal Representations
Wang, Jialu
Liu, Yang
Wang, Xin Eric
[J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 2681 - 2695
[7] Are Pre-trained Convolutions Better than Pre-trained Transformers?
Tay, Yi
Dehghani, Mostafa
Gupta, Jai
Aribandi, Vamsi
Bahri, Dara
Qin, Zhen
Metzler, Donald
[J]. 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 4349 - 4359
[8] Aspect Based Sentiment Analysis by Pre-trained Language Representations
Liang Tianxin
Yang Xiaoping
Zhou Xibo
Wang Bingqian
[J]. 2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1262 - 1265
[9] NewsEmbed: Modeling News through Pre-trained Document Representations
Liu, Jialu
Liu, Tianqi
Yu, Cong
[J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 1076 - 1086
[10] Pre-Trained Models Based Receiver Design With Natural Redundancy for Chinese Characters
Wang, Zhen-Yu
Yu, Hong-Yi
Shen, Cai-Yao
Zhu, Zhao-Rui
Shen, Zhi-Xiang
Du, Jian-Ping
[J]. IEEE COMMUNICATIONS LETTERS, 2022, 26 (10) : 2350 - 2354

← 1 2 3 4 5 →