Towards a Better Understanding of Label Smoothing in Neural Machine Translation

被引:0
|
作者
Gao, Yingbo [1 ]
Wang, Weiyue [1 ]
Herold, Christian [1 ]
Yang, Zijian [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Comp Sci Dept, Human Language Technol & Pattern Recognit Grp, D-52056 Aachen, Germany
来源
1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020) | 2020年
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to combat overfitting and in pursuit of better generalization, label smoothing is widely applied in modern neural machine translation systems. The core idea is to penalize over-confident outputs and regularize the model so that its outputs do not diverge too much from some prior distribution. While training perplexity generally gets worse, label smoothing is found to consistently improve test performance. In this work, we aim to better understand label smoothing in the context of neural machine translation. Theoretically, we derive and explain exactly what label smoothing is optimizing for. Practically, we conduct extensive experiments by varying which tokens to smooth, tuning the probability mass to be deducted from the true targets and considering different prior distributions. We show that label smoothing is theoretically well-motivated, and by carefully choosing hyperparameters, the practical performance of strong neural machine translation systems can be further improved.
引用
收藏
页码:212 / 223
页数:12
相关论文
共 50 条
  • [1] The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics
    Rei, Ricardo
    Guerreiro, Nuno M.
    Treviso, Marcos
    Lavie, Alon
    Coheur, Luisa
    Martins, Andre F. T.
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 1089 - 1105
  • [2] Towards a Better Understanding of Variations in Zero-Shot Neural Machine Translation Performance
    Tan, Shaomu
    Monz, Christof
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 13553 - 13568
  • [3] Towards Understanding Neural Machine Translation with Word Importance
    He, Shilin
    Tu, Zhaopeng
    Wang, Xing
    Wang, Longyue
    Lyu, Michael R.
    Shi, Shuming
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 953 - 962
  • [4] Towards Understanding Neural Machine Translation with Attention Heads' Importance
    Zhou, Zijie
    Zhu, Junguo
    Li, Weijiang
    APPLIED SCIENCES-BASEL, 2024, 14 (07):
  • [5] Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation
    Zhang, Songming
    Liang, Yunlong
    Wang, Shuaibo
    Chen, Yufeng
    Han, Wenjuan
    Liu, Jian
    Xu, Jinan
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 8062 - 8079
  • [6] Visualizing and Understanding Neural Machine Translation
    Ding, Yanzhuo
    Liu, Yang
    Luan, Huanbo
    Sun, Maosong
    PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, : 1150 - 1159
  • [7] Understanding Data Augmentation in Neural Machine Translation: Two Perspectives towards Generalization
    Li, Guanlin
    Liu, Lemao
    Huang, Guoping
    Zhu, Conghui
    Zhao, Tiejun
    Shi, Shuming
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5689 - 5695
  • [8] Towards Robust Neural Machine Translation
    Cheng, Yong
    Tu, Zhaopeng
    Meng, Fandong
    Zhai, Junjie
    Liu, Yang
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1756 - 1766
  • [9] Towards a Better Integration of Fuzzy Matches in Neural Machine Translation through Data Augmentation
    Tezcan, Arda
    Bulte, Bram
    Vanroy, Bram
    INFORMATICS-BASEL, 2021, 8 (01):
  • [10] P-Transformer: Towards Better Document-to-Document Neural Machine Translation
    Li, Yachao
    Li, Junhui
    Jiang, Jing
    Tao, Shimin
    Yang, Hao
    Zhang, Min
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3859 - 3870