Fixing Translation Divergences in Parallel Corpora for Neural MT

被引:0
|
作者
MinhQuang Pham [1 ,2 ]
Crego, Josep [1 ]
Senellart, Jean [1 ]
Yvon, Francois [2 ]
机构
[1] SYSTRAN, 5 Rue Feydeau, F-75002 Paris, France
[2] Univ Paris Saclay, CNRS, LIMSI, F-91405 Orsay, France
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Corpus-based approaches to machine translation rely on the availability of clean parallel corpora. Such resources are scarce, and because of the automatic processes involved in their preparation, they are often noisy. This paper describes an unsupervised method for detecting translation divergences in parallel sentences. We rely on a neural network that computes cross-lingual sentence similarity scores, which are then used to effectively filter out divergent translations. Furthermore, similarity scores predicted by the network are used to identify and fix some partial divergences, yielding additional parallel segments. We evaluate these methods for English-French and English-German machine translation tasks, and show that using filtered/corrected corpora actually improves MT performance.
引用
收藏
页码:2967 / 2973
页数:7
相关论文
共 50 条
  • [1] Parallel Corpora and Translation Teaching
    Bai, Jingang
    [J]. PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MECHATRONICS, COMPUTER AND EDUCATION INFORMATIONIZATION (MCEI 2016), 2016, 130 : 689 - 693
  • [2] The Application of Parallel Corpora in Translation Teaching
    Bai, Jingang
    [J]. PROCEEDINGS OF THE 2016 6TH INTERNATIONAL CONFERENCE ON MECHATRONICS, COMPUTER AND EDUCATION INFORMATIONIZATION (MCEI 2016), 2016, 130 : 363 - 368
  • [3] Corpus Augmentation for Neural Machine Translation with Chinese-Japanese Parallel Corpora
    Zhang, Jinyi
    Matsumoto, Tadahiro
    [J]. APPLIED SCIENCES-BASEL, 2019, 9 (10):
  • [4] VANT : A Visual Analytics System for Refining Parallel Corpora in Neural Machine Translation
    Park, Sebeom
    Lee, Soohyun
    Kim, Youngtaek
    Jeon, Hyeon
    Jung, Seokweon
    Bok, Jinwook
    Seo, Jinwook
    [J]. 2022 IEEE 15TH PACIFIC VISUALIZATION SYMPOSIUM (PACIFICVIS 2022), 2022, : 181 - 185
  • [5] Neural machine translation for low-resource languages without parallel corpora
    Karakanta, Alina
    Dehdari, Jon
    van Genabith, Josef
    [J]. MACHINE TRANSLATION, 2018, 32 (1-2) : 167 - 189
  • [6] Contrastive linguistics, translation, and parallel corpora
    Ebeling, J
    [J]. META, 1998, 43 (04) : 602 - 615
  • [7] Approaching translationese through parallel and translation corpora
    Schmied, J
    Schaffler, H
    [J]. SYNCHRONIC CORPUS LINGUISTICS, 1996, (16): : 41 - 56
  • [8] Acquisition of translation rules from parallel corpora
    Matsumoto, Y
    Kitamura, M
    [J]. RECENT ADVANCES IN NATURAL LANGUAGE PROCESSING, 1997, 136 : 405 - 416
  • [9] Application of Parallel Corpora in Translation Teaching Class
    Zhao, Yu-shan
    Qiqige, Saihan
    Lv, Liang-qiu
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON HUMANITIES AND SOCIAL SCIENCE (HSS 2016), 2016, 33 : 279 - 284
  • [10] Parallel Corpora based Translation Resources Extraction
    Simoes, Alberto
    Almeida, Jose Joao
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2007, (39): : 265 - 272