IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

被引:0
|
作者
Zhou, Wenxuan [1 ]
Lin, Bill Yuchen [1 ]
Ren, Xiang [1 ]
机构
[1] Univ Southern Calif, Dept Comp Sci, Los Angeles, CA 90007 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has been a common practice for advancing performance in natural language understanding (NLU) tasks. Recent advance in representation learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings can significantly improve performance on downstream tasks with faster convergence and better generalization. The isotropy of the pre-trained embeddings in PTLMs, however, is relatively under-explored. In this paper, we analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with straightforward visualization, and point out two major issues: high variance in their standard deviation, and high correlation between different dimensions. We also propose a new network regularization method, isotropic batch normalization (IsoBN) to address the issues, towards learning more isotropic representations in fine-tuning by dynamically penalizing dominating principal components. This simple yet effective fine-tuning method yields about 1.0 absolute increment on the average of seven NLU tasks.
引用
收藏
页码:14621 / 14629
页数:9
相关论文
共 50 条
  • [1] Transfer fine-tuning of BERT with phrasal paraphrases
    Arase, Yuki
    Tsujii, Junichi
    [J]. COMPUTER SPEECH AND LANGUAGE, 2021, 66
  • [2] Efficient Fine-Tuning of BERT Models on the Edge
    Vucetic, Danilo
    Tayaranian, Mohammadreza
    Ziaeefard, Maryam
    Clark, James J.
    Meyer, Brett H.
    Gross, Warren J.
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 22), 2022, : 1838 - 1842
  • [3] Transfer Fine-Tuning: A BERT Case Study
    Arase, Yuki
    Tsujii, Junichi
    [J]. 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5393 - 5404
  • [4] SPEECH RECOGNITION BY SIMPLY FINE-TUNING BERT
    Huang, Wen-Chin
    Wu, Chia-Hua
    Luo, Shang-Bao
    Chen, Kuan-Yu
    Wang, Hsin-Min
    Toda, Tomoki
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7343 - 7347
  • [5] Investigating Learning Dynamics of BERT Fine-Tuning
    Hao, Yaru
    Dong, Li
    Wei, Furu
    Xu, Ke
    [J]. 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020), 2020, : 87 - 92
  • [6] Patent classification by fine-tuning BERT language model
    Lee, Jieh-Sheng
    Hsiang, Jieh
    [J]. WORLD PATENT INFORMATION, 2020, 61
  • [7] Fine-Tuning BERT for Generative Dialogue Domain Adaptation
    Labruna, Tiziano
    Magnini, Bernardo
    [J]. TEXT, SPEECH, AND DIALOGUE (TSD 2022), 2022, 13502 : 513 - 524
  • [8] Dataset Distillation with Attention Labels for Fine-tuning BERT
    Maekawa, Aru
    Kobayashi, Naoki
    Funakoshi, Kotaro
    Okumura, Manabu
    [J]. 61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 119 - 127
  • [9] Noise Stability Regularization for Improving BERT Fine-tuning
    Hua, Hang
    Li, Xingjian
    Dou, Dejing
    Xu, Chengzhong
    Luo, Jiebo
    [J]. 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 3229 - 3241
  • [10] A Closer Look at How Fine-tuning Changes BERT
    Zhou, Yichu
    Srikumar, Vivek
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1046 - 1061