Revisiting comparable corpora in connected space

被引:0
|
作者
Zweigenbaum, Pierre [1 ]
机构
[1] CNRS, LIMSI, UPR 3251, F-91403 Orsay, France
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Bilingual lexicon extraction from comparable corpora is generally addressed through two monolingual distributional spaces of context vectors connected through a (partial) bilingual lexicon. We sketch here an abstract view of the task where these two spaces are embedded into one common bilingual space, and the two comparable corpora are merged into one bilingual corpus. We show how this paradigm accounts for a variety of models proposed so far, and where a set of topics addressed so far take place in this framework: degree of comparability, ambiguity in the bilingual lexicon, where parallel corpora stand with respect to this view, e.g., to replace the bilingual lexicon. A first experiment, using comparable corpora built from parallel corpora, illustrates one way to put this framework into practice. We also outline how this paradigm suggests directions for future investigations. We finally discuss the current limitations of the model and directions to solve them.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Building and using multimodal comparable corpora for machine translation
    Afli, Haithem
    Barrault, Loic
    Schwenk, Holger
    NATURAL LANGUAGE ENGINEERING, 2016, 22 (04) : 603 - 625
  • [32] Exploiting Comparable Corpora for Building and Expanding Terminological Resources
    Sadat, Fatiha
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : E13 - E16
  • [33] Recent advances in machine translation using comparable corpora
    Rapp, Reinhard
    Sharoff, Serge
    Zweigenbaum, Pierre
    NATURAL LANGUAGE ENGINEERING, 2016, 22 (04) : 501 - 516
  • [34] A light way to collect comparable corpora from the Web
    Aker, Ahmet
    Kanoulas, Evangelos
    Gaizauskas, Robert
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 15 - 20
  • [35] Using Comparable Corpora to Adapt a Translation Model to Domains
    Kaji, Hiroyuki
    Tsunakawa, Takashi
    Okada, Daisuke
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2182 - 2188
  • [36] Vector Disambiguation for Translation Extraction from Comparable Corpora
    Apidianaki, Marianna
    Ljubesic, Nikola
    Fiser, Darja
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2013, 37 (02): : 193 - 202
  • [37] Extracting Multilingual Topics from Unaligned Comparable Corpora
    Jagarlamudi, Jagadeesh
    Daume, Hal, III
    ADVANCES IN INFORMATION RETRIEVAL, PROCEEDINGS, 2010, 5993 : 444 - 456
  • [38] Automatic Methods for the Extension of a Bilingual Dictionary using Comparable Corpora
    Rosner, Michael
    Sultana, Kurt
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3790 - 3797
  • [39] Generalising lexical translation strategies for MT using comparable corpora
    Babych, Bogdan
    Sharoff, Serge
    Hartley, Anthony
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 1338 - 1342
  • [40] Reflections on the Penn Discourse TreeBank, Comparable Corpora, and Complementary Annotation
    Prasad, Rashmi
    Webber, Bonnie
    Joshi, Aravind
    COMPUTATIONAL LINGUISTICS, 2014, 40 (04) : 921 - 950