Towards a Better Understanding of Label Smoothing in Neural Machine Translation

被引:0
|
作者
Gao, Yingbo [1 ]
Wang, Weiyue [1 ]
Herold, Christian [1 ]
Yang, Zijian [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Comp Sci Dept, Human Language Technol & Pattern Recognit Grp, D-52056 Aachen, Germany
来源
1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020) | 2020年
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to combat overfitting and in pursuit of better generalization, label smoothing is widely applied in modern neural machine translation systems. The core idea is to penalize over-confident outputs and regularize the model so that its outputs do not diverge too much from some prior distribution. While training perplexity generally gets worse, label smoothing is found to consistently improve test performance. In this work, we aim to better understand label smoothing in the context of neural machine translation. Theoretically, we derive and explain exactly what label smoothing is optimizing for. Practically, we conduct extensive experiments by varying which tokens to smooth, tuning the probability mass to be deducted from the true targets and considering different prior distributions. We show that label smoothing is theoretically well-motivated, and by carefully choosing hyperparameters, the practical performance of strong neural machine translation systems can be further improved.
引用
收藏
页码:212 / 223
页数:12
相关论文
共 50 条
  • [21] Towards Making the Most of BERT in Neural Machine Translation
    Yang, Jiacheng
    Wang, Mingxuan
    Zhou, Hao
    Zhao, Chengqi
    Yu, Yong
    Zhang, Weinan
    Li, Lei
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9378 - 9385
  • [22] Towards String-to-Tree Neural Machine Translation
    Aharoni, Roee
    Goldberg, Yoav
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 2, 2017, : 132 - 140
  • [23] Detecting Source Contextual Barriers for Understanding Neural Machine Translation
    Li, Guanlin
    Liu, Lemao
    Zhu, Conghui
    Wang, Rui
    Zhao, Tiejun
    Shi, Shuming
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3158 - 3169
  • [24] Understanding and Improving the Robustness of Terminology Constraints in Neural Machine Translation
    Zhang, Huaao
    Wang, Qiang
    Qin, Bo
    Shi, Zelin
    Wang, Haibo
    Chen, Ming
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 6029 - 6042
  • [25] Towards Hybrid Neural Machine Translation for English-Latvian
    Pinnis, Marcis
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2016, 289 : 84 - 91
  • [26] Towards More Diverse Input Representation for Neural Machine Translation
    Chen, Kehai
    Wang, Rui
    Utiyama, Masao
    Sumita, Eiichiro
    Zhao, Tiejun
    Yang, Muyun
    Zhao, Hai
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1586 - 1597
  • [27] Towards Linear Time Neural Machine Translation with Capsule Networks
    Wang, Mingxuan
    Xie, Jun
    Tan, Zhixing
    Su, Jinsong
    Xiong, Deyi
    Li, Lei
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 803 - 812
  • [28] Towards Building a Strong Transformer Neural Machine Translation System
    Wang, Qiang
    Li, Bei
    Liu, Jiqiang
    Jiang, Bojian
    Zhang, Zheyang
    Li, Yinqiao
    Lin, Ye
    Xiao, Tong
    Zhu, Jingbo
    MACHINE TRANSLATION, CWMT 2018, 2019, 954 : 101 - 110
  • [29] Better Neural Machine Translation by Extracting Linguistic Information from BERT
    Shavarani, Hassan S.
    Sarkar, Anoop
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2772 - 2783
  • [30] UNDERSTANDING MACHINE TRANSLATION
    Varga, Agnes
    IDIMT-2006, 2006, 19 : 285 - 296