Towards a Better Understanding of Label Smoothing in Neural Machine Translation

被引:0
|
作者
Gao, Yingbo [1 ]
Wang, Weiyue [1 ]
Herold, Christian [1 ]
Yang, Zijian [1 ]
Ney, Hermann [1 ]
机构
[1] Rhein Westfal TH Aachen, Comp Sci Dept, Human Language Technol & Pattern Recognit Grp, D-52056 Aachen, Germany
来源
1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (AACL-IJCNLP 2020) | 2020年
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In order to combat overfitting and in pursuit of better generalization, label smoothing is widely applied in modern neural machine translation systems. The core idea is to penalize over-confident outputs and regularize the model so that its outputs do not diverge too much from some prior distribution. While training perplexity generally gets worse, label smoothing is found to consistently improve test performance. In this work, we aim to better understand label smoothing in the context of neural machine translation. Theoretically, we derive and explain exactly what label smoothing is optimizing for. Practically, we conduct extensive experiments by varying which tokens to smooth, tuning the probability mass to be deducted from the true targets and considering different prior distributions. We show that label smoothing is theoretically well-motivated, and by carefully choosing hyperparameters, the practical performance of strong neural machine translation systems can be further improved.
引用
收藏
页码:212 / 223
页数:12
相关论文
共 50 条
  • [41] Empirical study on label smoothing in neural networks
    Mezzini, Mauro
    26. INTERNATIONAL CONFERENCE IN CENTRAL EUROPE ON COMPUTER GRAPHICS, VISUALIZATION AND COMPUTER VISION (WSCG 2018), 2018, 2802 : 200 - 205
  • [42] Neural Machine Translation
    Jooste, Wandri
    Haque, Rejwanul
    Way, Andy
    MACHINE TRANSLATION, 2021, 35 (02) : 289 - 299
  • [43] Neural Machine Translation
    Birch, Alexandra
    NATURAL LANGUAGE ENGINEERING, 2021, 27 (03) : 377 - 378
  • [44] Checkpoint Reranking: An Approach To Select Better Hypothesis For Neural Machine Translation Systems
    Pandramish, Vinay
    Sharma, Dipti Misra
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020): STUDENT RESEARCH WORKSHOP, 2020, : 286 - 291
  • [45] An Empirical Study towards Characterizing Neural Machine Translation Testing Methods
    He, Chenxi
    Liu, Wenhong
    Zhao, Shuang
    Liu, Jiawei
    Yang, Yang
    2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY, AND SECURITY COMPANION, QRS-C, 2022, : 179 - 182
  • [46] Neural Machine Translation Advised by Statistical Machine Translation
    Wang, Xing
    Lu, Zhengdong
    Tu, Zhaopeng
    Li, Hang
    Xiong, Deyi
    Zhang, Min
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3330 - 3336
  • [47] Neural Machine Translation as a Novel Approach to Machine Translation
    Benkova, Lucia
    Benko, Lubomir
    DIVAI 2020: 13TH INTERNATIONAL SCIENTIFIC CONFERENCE ON DISTANCE LEARNING IN APPLIED INFORMATICS, 2020, : 499 - 508
  • [48] Neural Name Translation Improves Neural Machine Translation
    Li, Xiaoqing
    Yan, Jinghui
    Zhang, Jiajun
    Zong, Chengqing
    MACHINE TRANSLATION, CWMT 2018, 2019, 954 : 93 - 100
  • [49] Improving Sign Language Understanding Introducing Label Smoothing
    Sihan, Tan
    Khanum, Khan Nabeela
    Katsutoshi, Itoyama
    Kazuhiro, Nakadai
    2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 113 - 118
  • [50] TRANSLATION, THE BLACK SHEEP OF ALL PROFESSIONS + TOWARDS A BETTER UNDERSTANDING OF AN OFTEN MISUNDERSTOOD FIELD
    LEDERER, M
    FRANCAIS MODERNE, 1980, 48 (04): : 298 - 307