Exploring the Role of Monolingual Data in Cross-Attention Pre-training for Neural Machine Translation

被引:0
|
作者
Khang Pham [1 ,2 ]
Long Nguyen [1 ,2 ]
Dien Dinh [1 ,2 ]
机构
[1] Univ Sci, Fac Informat Technol, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
关键词
Natural Language Processing; Neural Machine Translation; Pre-training; Cross-attention; Monolingual data;
D O I
10.1007/978-3-031-41456-5_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advancements in large pre-trained language models have revolutionized the field of natural language processing (NLP). Despite the impressive results achieved in various NLP tasks, their effectiveness in neural machine translation (NMT) remains limited. The main challenge lies in the mismatch between the pre-training objectives of the language model and the translation task, where the language modeling task focuses on reconstructing the language without considering its semantic interaction with other languages. This results in cross-attention weights being randomly initialized and learned from scratch during NMT training. To overcome this issue, one approach is to utilize joint monolingual corpora to pre-train the cross-attention weights, improving the semantic interaction between the source and target languages. In this paper, we perform extensive experiments to analyze the impact of monolingual data on this pre-training approach and demonstrate its effectiveness in enhancing the NMT performance.
引用
收藏
页码:179 / 190
页数:12
相关论文
共 50 条
  • [1] Pre-training Methods for Neural Machine Translation
    Wang, Mingxuan
    Li, Lei
    [J]. ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: TUTORIAL ABSTRACTS, 2021, : 21 - 25
  • [2] Multilingual Denoising Pre-training for Neural Machine Translation
    Liu, Yinhan
    Gu, Jiatao
    Goyal, Naman
    Li, Xian
    Edunov, Sergey
    Ghazvininejad, Marjan
    Lewis, Mike
    Zettlemoyer, Luke
    [J]. TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2020, 8 : 726 - 742
  • [3] On the Copying Behaviors of Pre-Training for Neural Machine Translation
    Liu, Xuebo
    Wang, Longyue
    Wong, Derek F.
    Ding, Liang
    Chao, Lidia S.
    Shi, Shuming
    Tu, Zhaopeng
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4265 - 4275
  • [4] Curriculum pre-training for stylized neural machine translation
    Zou, Aixiao
    Wu, Xuanxuan
    Li, Xinjie
    Zhang, Ting
    Cui, Fuwei
    Xu, Jinan
    [J]. APPLIED INTELLIGENCE, 2024, 54 (17-18) : 7958 - 7968
  • [5] Pre-Training on Mixed Data for Low-Resource Neural Machine Translation
    Zhang, Wenbo
    Li, Xiao
    Yang, Yating
    Dong, Rui
    [J]. INFORMATION, 2021, 12 (03)
  • [6] Effectively training neural machine translation models with monolingual data
    Yang, Zhen
    Chen, Wei
    Wang, Feng
    Xu, Bo
    [J]. NEUROCOMPUTING, 2019, 333 : 240 - 247
  • [7] Joint Training for Neural Machine Translation Models with Monolingual Data
    Zhang, Zhirui
    Liu, Shujie
    Li, Mu
    Zhou, Ming
    Chen, Enhong
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 555 - 562
  • [8] DEEP: DEnoising Entity Pre-training for Neural Machine Translation
    Hu, Junjie
    Hayashi, Hiroaki
    Cho, Kyunghyun
    Neubig, Graham
    [J]. PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1753 - 1766
  • [9] On the Complementarity between Pre-Training and Back-Translation for Neural Machine Translation
    Liu, Xuebo
    Wang, Longyue
    Wong, Derek F.
    Ding, Liang
    Chao, Lidia S.
    Shi, Shuming
    Tu, Zhaopeng
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2900 - 2907
  • [10] Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model
    Zhang, Shaolei
    Feng, Yang
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1401 - 1411