Diffused Redundancy in Pre-trained Representations

被引：0

作者：

Nanda, Vedant ^{[1
,2
]}

Speicher, Till ^{[2
]}

Dickerson, John P. ^{[1
]}

Gummadi, Krishna P. ^{[2
]}

Feizi, Soheil ^{[1
]}

Weller, Adrian ^{[3
,4
]}

机构：

[1] Univ Maryland, College Pk, MD 20742 USA

[2] MPI SWS, Saarbrucken, Germany

[3] Alan Turing Inst, London, England

[4] Univ Cambridge, Cambridge, England

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

NEURAL-NETWORKS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Representations learned by pre-training a neural network on a large dataset are increasingly used successfully to perform a variety of downstream tasks. In this work, we take a closer look at how features are encoded in such pre-trained representations. We find that learned representations in a given layer exhibit a degree of diffuse redundancy, i.e., any randomly chosen subset of neurons in the layer that is larger than a threshold size shares a large degree of similarity with the full layer and is able to perform similarly as the whole layer on a variety of downstream tasks. For example, a linear probe trained on 20% of randomly picked neurons from the penultimate layer of a ResNet50 pre-trained on ImageNet1k achieves an accuracy within 5% of a linear probe trained on the full layer of neurons for downstream CIFAR10 classification. We conduct experiments on different neural architectures (including CNNs and Transformers) pretrained on both ImageNet1k and ImageNet21k and evaluate a variety of down-stream tasks taken from the VTAB benchmark. We find that the loss & dataset used during pre-training largely govern the degree of diffuse redundancy and the "critical mass" of neurons needed often depends on the downstream task, suggesting that there is a task-inherent redundancy-performance Pareto frontier. Our findings shed light on the nature of representations learned by pre-trained deep neural networks and suggest that entire layers might not be necessary to perform many downstream tasks. We investigate the potential for exploiting this redundancy to achieve efficient generalization for downstream tasks and also draw caution to certain possible unintended consequences. Our code is available at https://github.com/nvedant07/diffused-redundancy.

引用

页数：25

共 50 条

[31] Pre-trained transformers: an empirical comparison
Casola, Silvia
Lauriola, Ivano
Lavelli, Alberto
[J]. MACHINE LEARNING WITH APPLICATIONS, 2022, 9
[32] Implicit Stereotypes in Pre-Trained Classifiers
Dehouche, Nassim
[J]. IEEE ACCESS, 2021, 9 : 167936 - 167947
[33] Detecting Backdoors in Pre-trained Encoders
Feng, Shiwei
Tao, Guanhong
Cheng, Siyuan
Shen, Guangyu
Xu, Xiangzhe
Liu, Yingqi
Zhang, Kaiyuan
Ma, Shiqing
Zhang, Xiangyu
[J]. arXiv, 2023,
[34] Pre-Trained Image Processing Transformer
Chen, Hanting
Wang, Yunhe
Guo, Tianyu
Xu, Chang
Deng, Yiping
Liu, Zhenhua
Ma, Siwei
Xu, Chunjing
Xu, Chao
Gao, Wen
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12294 - 12305
[35] Efficiently Robustify Pre-Trained Models
Jain, Nishant
Behl, Harkirat
Rawat, Yogesh Singh
Vineet, Vibhav
[J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5482 - 5492
[36] Pre-trained Models for Sonar Images
Valdenegro-Toro, Matias
Preciado-Grijalva, Alan
Wehbe, Bilal
[J]. OCEANS 2021: SAN DIEGO - PORTO, 2021,
[37] Debiasing Pre-trained Contextualised Embeddings
Kaneko, Masahiro
Bollegala, Danushka
[J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 1256 - 1266
[38] Detecting Backdoors in Pre-trained Encoders
Feng, Shiwei
Tao, Guanhong
Cheng, Siyuan
Shen, Guangyu
Xu, Xiangzhe
Liu, Yingqi
Zhang, Kaiyuan
Ma, Shiqing
Zhang, Xiangyu
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 16352 - 16362
[39] Hyperbolic Pre-Trained Language Model
Chen, Weize
Han, Xu
Lin, Yankai
He, Kaichen
Xie, Ruobing
Zhou, Jie
Liu, Zhiyuan
Sun, Maosong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3101 - 3112
[40] USING PRE-TRAINED TEMPORARY HELP
ZITO, JM
[J]. TRAINING AND DEVELOPMENT JOURNAL, 1968, 22 (09): : 24 - &

← 1 2 3 4 5 →