Minimum Risk Training for Neural Machine Translation

被引:0
|
作者
Shen, Shiqi [1 ]
Cheng, Yong [2 ]
He, Zhongjun [3 ]
He, Wei [3 ]
Wu, Hua [3 ]
Sun, Maosong [1 ]
Liu, Yang [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol, State Key Lab Intelligent Technol & Syst, Beijing, Peoples R China
[2] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China
[3] Baidu Inc, Beijing, Peoples R China
基金
中国国家自然科学基金; 新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose minimum risk training for end-to-end neural machine translation. Unlike conventional maximum likelihood estimation, minimum risk training is capable of optimizing model parameters directly with respect to arbitrary evaluation metrics, which are not necessarily differentiable. Experiments show that our approach achieves significant improvements over maximum likelihood estimation on a state-of-the-art neural machine translation system across various languages pairs. Transparent to architectures, our approach can be applied to more neural networks and potentially benefit more NLP tasks.
引用
下载
收藏
页码:1683 / 1692
页数:10
相关论文
共 50 条
  • [21] Training Deeper Neural Machine Translation Models with Transparent Attention
    Bapna, Ankur
    Chen, Mia Xu
    Firat, Orhan
    Cao, Yuan
    Wu, Yonghui
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3028 - 3033
  • [22] Training Google Neural Machine Translation on an Intel CPU Cluster
    Kalamkar, Dhiraj D.
    Banerjee, Kunal
    Srinivasan, Sudarshan
    Sridharan, Srinivas
    Georganas, Evangelos
    Smorkalov, Mikhail E.
    Xu, Cong
    Heinecke, Alexander
    2019 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2019, : 193 - 202
  • [23] From Bilingual to Multilingual Neural Machine Translation by Incremental Training
    Escolano, Carlos
    Costa-Jussa, Marta R.
    Fonollosa, Jose A. R.
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, 2019, : 236 - 242
  • [24] Alternated Training with Synthetic and Authentic Data for Neural Machine Translation
    Jiao, Rui
    Yang, Zonghan
    Sun, Maosong
    Liu, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1828 - 1834
  • [25] Training with Adversaries to Improve Faithfulness of Attention in Neural Machine Translation
    Moradi, Pooya
    Kambhatla, Nishant
    Sarkar, Anoop
    AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 86 - 93
  • [26] On the Copying Behaviors of Pre-Training for Neural Machine Translation
    Liu, Xuebo
    Wang, Longyue
    Wong, Derek F.
    Ding, Liang
    Chao, Lidia S.
    Shi, Shuming
    Tu, Zhaopeng
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4265 - 4275
  • [27] Effectively training neural machine translation models with monolingual data
    Yang, Zhen
    Chen, Wei
    Wang, Feng
    Xu, Bo
    NEUROCOMPUTING, 2019, 333 : 240 - 247
  • [28] Token-level Adaptive Training for Neural Machine Translation
    Gu, Shuhao
    Zhang, Jinchao
    Meng, Fandong
    Feng, Yang
    Xie, Wanying
    Zhou, Jie
    Yu, Dong
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1035 - 1046
  • [29] Curriculum pre-training for stylized neural machine translation
    Zou, Aixiao
    Wu, Xuanxuan
    Li, Xinjie
    Zhang, Ting
    Cui, Fuwei
    Xu, Jinan
    APPLIED INTELLIGENCE, 2024, 54 (17-18) : 7958 - 7968
  • [30] Bridging the Gap between Training and Inference for Neural Machine Translation
    Zhang, Wen
    Feng, Yang
    Liu, Qun
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4790 - 4794