A Unified and Unsupervised Framework for Bilingual Phrase Alignment on Specialized Comparable Corpora

被引:0
|
作者
Liu, Jingshu [1 ,2 ]
Morin, Emmanuel [1 ]
Saldarriaga, Sebastian Pena [2 ]
Lark, Joseph [2 ]
机构
[1] Univ Nantes, UMR CNRS 6004, LS2N, Nantes, France
[2] Dictanova, Nantes, France
关键词
D O I
10.3233/FAIA200332
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Significant advances have been achieved in bilingual word-level alignment, yet the challenge remains for phrase-level alignment. Moreover, the need for parallel data is a critical drawback for the alignment task. In particular, this makes multi-word terms very difficult to align in specialized domains. This work proposes a system that alleviates these two problems: a unified phrase representation model using cross-lingual word embeddings as input, and an unsupervised training algorithm inspired by recent works on neural machine translation. The system consists of a sequence-to-sequence architecture where a short sequence encoder constructs cross-lingual representations of phrases of any length, then an LSTM network decodes them w.r.t their contexts. After training, our encoder provides cross-lingual phrase representations that can be compared without further transformation. Experiments on five specialized domain datasets show that our method obtains state-of-the-art results on the bilingual phrase alignment task, and improves the results of different length phrase alignment by a mean of 8.8 points in MAP.
引用
收藏
页码:2093 / 2100
页数:8
相关论文
共 50 条
  • [1] From unified phrase representation to bilingual phrase alignment in an unsupervised manner
    Liu, Jingshu
    Morin, Emmanuel
    Pena Saldarriaga, Sebastian
    Lark, Joseph
    NATURAL LANGUAGE ENGINEERING, 2023, 29 (03) : 643 - 668
  • [2] Looking at Unbalanced Specialized Comparable Corpora for Bilingual Lexicon Extraction
    Morin, Emmanuel
    Hazem, Amir
    PROCEEDINGS OF THE 52ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2014, : 1284 - 1293
  • [3] Exploiting unbalanced specialized comparable corpora for bilingual lexicon extraction
    Morin, Emmanuel
    Hazem, Amir
    NATURAL LANGUAGE ENGINEERING, 2016, 22 (04) : 575 - 601
  • [4] Unsupervised word-sense disambiguation using bilingual comparable corpora
    Kaji, H
    Morimoto, Y
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (02) : 289 - 301
  • [5] Bilingual comparable corpora and the training of translators
    Zanettin, F
    META, 1998, 43 (04) : 616 - 630
  • [6] NP alignment in bilingual corpora
    Recski, Gabor
    Rung, Andras
    Zsedar, Atila
    Kornai, Andras
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 3379 - 3382
  • [7] Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
    Yan Xu
    Luoxin Chen
    Junsheng Wei
    Sophia Ananiadou
    Yubo Fan
    Yi Qian
    Eric I-Chao Chang
    Junichi Tsujii
    BMC Bioinformatics, 16
  • [8] Processing comparable corpora with bilingual suffix trees
    Munteanu, DS
    Marcu, D
    PROCEEDINGS OF THE 2002 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2002, : 289 - 295
  • [9] Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary
    Xu, Yan
    Chen, Luoxin
    Wei, Junsheng
    Ananiadou, Sophia
    Fan, Yubo
    Qian, Yi
    Chang, Eric I-Chao
    Tsujii, Junichi
    BMC BIOINFORMATICS, 2015, 16
  • [10] Sentence alignment for monolingual comparable corpora
    Barzilay, R
    Elhadad, N
    PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2003, : 25 - 32