Simplicity Bias in 1-Hidden Layer Neural Networks

被引：0

作者：

Morwani, Depen ^{[1
]}

Batra, Jatin ^{[2
]}

Jain, Prateek ^{[3
]}

Netrapalli, Praneeth ^{[3
]}

机构：

[1] Harvard Univ, Dept Comp Sci, Cambridge, MA 02138 USA

[2] Tata Inst Fundamental Res TIFR, Sch Technol & Comp Sci, Hyderabad, India

[3] Google Res, Mountain View, CA USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent works (Shah et al., 2020; Chen et al., 2021) have demonstrated that neural networks exhibit extreme simplicity bias (SB). That is, they learn only the simplest features to solve a task at hand, even in the presence of other, more robust but more complex features. Due to the lack of a general and rigorous definition of features, these works showcase SB on semi-synthetic datasets such as Color-MNIST, MNIST-CIFAR where defining features is relatively easier. In this work, we rigorously define as well as thoroughly establish SB for one hidden layer neural networks. More concretely, (i) we define SB as the network essentially being a function of a low dimensional projection of the inputs (ii) theoretically, in the infinite width regime, we show that when the data is linearly separable, the network primarily depends on only the linearly separable (1-dimensional) subspace even in the presence of an arbitrarily large number of other, more complex features which could have led to a significantly more robust classifier, (iii) empirically, we show that models trained on real datasets such as Imagenet and Waterbirds-Landbirds indeed depend on a low dimensional projection of the inputs, thereby demonstrating SB on these datasets, iv) finally, we present a natural ensemble approach that encourages diversity in models by training successive models on features not used by earlier models, and demonstrate that it yields models that are significantly more robust to Gaussian noise.

引用

页数：28

共 50 条

[1] The Pitfalls of Simplicity Bias in Neural Networks
Shah, Harshay
Tamuly, Kaustav
Raghunathan, Aditi
Jain, Prateek
Netrapalli, Praneeth
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[2] Neural network without bias neuron for hidden layer
Majetic, D.
Brezak, D.
Novakovic, B.
Kasac, J.
Annals of DAAAM for 2005 & Proceedings of the 16th International DAAAM Symposium: INTELLIGENT MANUFACTURING & AUTOMATION: FOCUS ON YOUNG RESEARCHES AND SCIENTISTS, 2005, : 239 - 240
[3] Neural Networks with Marginalized Corrupted Hidden Layer
Li, Yanjun
Xin, Xin
Guo, Ping
NEURAL INFORMATION PROCESSING, PT III, 2015, 9491 : 506 - 514
[4] Neural networks for word recognition: Is a hidden layer necessary?
Dandurand, Frederic
Hannagan, Thomas
Grainger, Jonathan
COGNITION IN FLUX, 2010, : 688 - 693
[5] Regularization of hidden layer unit response for neural networks
Taga, K
Kameyama, K
Toraichi, K
2003 IEEE PACIFIC RIM CONFERENCE ON COMMUNICATIONS, COMPUTERS, AND SIGNAL PROCESSING, VOLS 1 AND 2, CONFERENCE PROCEEDINGS, 2003, : 348 - 351
[6] HOW TO DETERMINE THE STUCTRUE OF THE HIDDEN LAYER IN NEURAL NETWORKS
魏强
张士军
张勇传
水电能源科学, 1997, (01) : 18 - 22
[7] Feedforward Neural Networks with a Hidden Layer Regularization Method
Alemu, Habtamu Zegeye
Wu, Wei
Zhao, Junhong
SYMMETRY-BASEL, 2018, 10 (10):
[8] Modular Expansion of the Hidden Layer in Single Layer Feedforward Neural Networks
Tissera, Migel D.
McDonnell, Mark D.
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2939 - 2945
[9] Collapsing multiple hidden layers in feedforward neural networks to a single hidden layer
Blue, JL
Hall, LO
APPLICATIONS AND SCIENCE OF ARTIFICIAL NEURAL NETWORKS II, 1996, 2760 : 44 - 52
[10] DEGREE OF APPROXIMATION BY NEURAL AND TRANSLATION NETWORKS WITH A SINGLE HIDDEN LAYER
MHASKAR, HN
MICCHELLI, CA
ADVANCES IN APPLIED MATHEMATICS, 1995, 16 (02) : 151 - 183

← 1 2 3 4 5 →