Revisiting comparable corpora in connected space

被引:0
|
作者
Zweigenbaum, Pierre [1 ]
机构
[1] CNRS, LIMSI, UPR 3251, F-91403 Orsay, France
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Bilingual lexicon extraction from comparable corpora is generally addressed through two monolingual distributional spaces of context vectors connected through a (partial) bilingual lexicon. We sketch here an abstract view of the task where these two spaces are embedded into one common bilingual space, and the two comparable corpora are merged into one bilingual corpus. We show how this paradigm accounts for a variety of models proposed so far, and where a set of topics addressed so far take place in this framework: degree of comparability, ambiguity in the bilingual lexicon, where parallel corpora stand with respect to this view, e.g., to replace the bilingual lexicon. A first experiment, using comparable corpora built from parallel corpora, illustrates one way to put this framework into practice. We also outline how this paradigm suggests directions for future investigations. We finally discuss the current limitations of the model and directions to solve them.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Can comparable corpora be compared?
    Lopez Arroyo, Belen
    IBERICA, 2020, (39): : 43 - 68
  • [2] Named Entity Transliteration with Comparable Corpora
    Sproat, Richard
    Tao, Tao
    Zhai, ChengXiang
    COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 73 - 80
  • [3] Bilingual comparable corpora and the training of translators
    Zanettin, F
    META, 1998, 43 (04) : 616 - 630
  • [4] Multimodal Comparable Corpora for Machine Translation
    Afli, Haithem
    Barrault, Loic
    Schwenk, Holger
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [5] Sentence alignment for monolingual comparable corpora
    Barzilay, R
    Elhadad, N
    PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, 2003, : 25 - 32
  • [6] Repetition and Language Models and Comparable Corpora
    Church, Ken
    BUCC 2009 - 2nd Workshop on Building and Using Comparable Corpora: From Parallel to Non-Parallel Corpora at the ACL-IJCNLP 2009 - Proceedings, 2009,
  • [7] Wikipedia as Multilingual Source of Comparable Corpora
    Gamallo Otero, Pablo
    Gonzalez Lopez, Isaac
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 21 - 25
  • [8] Building English - Punjabi Aligned Parallel Corpora of Nouns from Comparable Corpora
    Kaur, Dilshad
    Singh, Satwinder
    APPLIED COMPUTER SYSTEMS, 2023, 28 (02) : 245 - 251
  • [9] Extracting Parallel Phrases from Comparable Corpora
    Zhang, Jiexin
    Cao, Hailong
    Zhao, Tiejun
    PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2014), 2014, : 166 - 169
  • [10] Terminology Extraction from Comparable Corpora for Latvian
    Gornostay, Tatiana
    Ramm, Anita
    Heid, Ulrich
    Morin, Emmanuel
    Harastani, Rima
    Planas, Emmanuel
    HUMAN LANGUAGE TECHNOLOGIES: THE BALTIC PERSPECTIVE, 2012, 247 : 66 - +