Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation

被引:0
|
作者
Junczys-Dowmunt, Marcin [1 ]
机构
[1] Microsoft, One Microsoft Way, Redmond, WA 98052 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes the Microsoft Translator submissions to the WMT19 news translation shared task for English-German. Our main focus is document-level neural machine translation with deep transformer models. We start with strong sentence-level baselines, trained on large-scale data created via data-filtering and noisy back-translation and find that back-translation seems to mainly help with translationese input. We explore fine-tuning techniques, deeper models and different ensembling strategies to counter these effects. Using document boundaries present in the authentic and synthetic parallel data, we create sequences of up to 1000 subword segments and train transformer translation models. We experiment with data augmentation techniques for the smaller authentic data with document-boundaries and for larger authentic data without boundaries. We further explore multi-task training for the incorporation of document-level source language monolingual data via the BERT-objective on the encoder and two-pass decoding for combinations of sentence-level and document-level systems. Based on preliminary human evaluation results, evaluators strongly prefer the document-level systems over our comparable sentence-level system. The document-level systems also seem to score higher than the human references in source-based direct assessment.
引用
收藏
页码:225 / 233
页数:9
相关论文
共 50 条
  • [21] Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context
    Tan, Xin
    Zhang, Long-Yin
    Zhou, Guo-Dong
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2022, 37 (02) : 295 - 308
  • [22] Addressing the Length Bias Problem in Document-Level Neural Machine Translation
    Zhang, Zhuocheng
    Gu, Shuhao
    Zhang, Min
    Feng, Yang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11545 - 11556
  • [23] Toward Understanding Most of the Context in Document-Level Neural Machine Translation
    Choi, Gyu-Hyeon
    Shin, Jong-Hun
    Lee, Yo-Han
    Kim, Young-Kil
    ELECTRONICS, 2022, 11 (15)
  • [24] Learning Contextualized Sentence Representations for Document-Level Neural Machine Translation
    Zhang, Pei
    Zhang, Xu
    Chen, Wei
    Yu, Jian
    Wang, Yanfeng
    Xiong, Deyi
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 2298 - 2305
  • [25] Document-level Neural Machine Translation Using BERT as Context Encoder
    Guo, Zhiyu
    Minh Le Nguyen
    AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 94 - 100
  • [26] Document-Level Neural Machine Translation with Hierarchical Modeling of Global Context
    Xin Tan
    Long-Yin Zhang
    Guo-Dong Zhou
    Journal of Computer Science and Technology, 2022, 37 : 295 - 308
  • [27] Hierarchical Modeling of Global Context for Document-Level Neural Machine Translation
    Tan, Xin
    Zhang, Longyin
    Xiong, Deyi
    Zhou, Guodong
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 1576 - 1585
  • [28] TANDO: A Corpus for Document-level Machine Translation
    Gete, Harritxu
    Etchegoyhen, Thierry
    Ponce, David
    Labaka, Gorka
    Aranberri, Nora
    Corral, Ander
    Saralegi, Xabier
    Santos, Igor Ellakuria
    Martin, Maite
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 3026 - 3037
  • [29] DocRED: A Large-Scale Document-Level Relation Extraction Dataset
    Yao, Yuan
    Ye, Deming
    Li, Peng
    Han, Xu
    Lin, Yankai
    Liu, Zhenghao
    Liu, Zhiyuan
    Huang, Lixin
    Zhou, Jie
    Sun, Maosong
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 764 - 777
  • [30] Semantically Constrained Document-Level Chinese-Mongolian Neural Machine Translation
    Li, Haoran
    Hou, Hongxu
    Wu, Nier
    Jia, Xiaoning
    Chang, Xin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,