Revisiting comparable corpora in connected space

被引:0
|
作者
Zweigenbaum, Pierre [1 ]
机构
[1] CNRS, LIMSI, UPR 3251, F-91403 Orsay, France
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Bilingual lexicon extraction from comparable corpora is generally addressed through two monolingual distributional spaces of context vectors connected through a (partial) bilingual lexicon. We sketch here an abstract view of the task where these two spaces are embedded into one common bilingual space, and the two comparable corpora are merged into one bilingual corpus. We show how this paradigm accounts for a variety of models proposed so far, and where a set of topics addressed so far take place in this framework: degree of comparability, ambiguity in the bilingual lexicon, where parallel corpora stand with respect to this view, e.g., to replace the bilingual lexicon. A first experiment, using comparable corpora built from parallel corpora, illustrates one way to put this framework into practice. We also outline how this paradigm suggests directions for future investigations. We finally discuss the current limitations of the model and directions to solve them.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] Collecting and Using Comparable Corpora for Statistical Machine Translation
    Skadina, Inguna
    Aker, Ahmet
    Mastropavlos, Nikos
    Su, Fangzhong
    Tufis, Dan
    Verlic, Mateja
    Vasiljevs, Andrejs
    Babych, Bogdan
    Clough, Paul
    Gaizauskas, Robert
    Glaros, Nikos
    Paramita, Monica Lestari
    Pinnis, Marcis
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 438 - 445
  • [22] Aligning Lay and Specialized Passages in Comparable Medical Corpora
    Deleger, Louise
    Zweigenbaum, Pierre
    EHEALTH BEYOND THE HORIZON - GET IT THERE, 2008, 136 : 89 - +
  • [23] Parallel Sentence Alignment from Biomedical Comparable Corpora
    Cardon, Remi
    Grabar, Natalia
    DIGITAL PERSONALIZED HEALTH AND MEDICINE, 2020, 270 : 362 - 366
  • [24] Improving Machine Translation Performance Using Comparable Corpora
    Eisele, Andreas
    Xu, Jia
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 35 - 41
  • [25] Extracting translation equivalents from bilingual comparable corpora
    Kaji, H
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (02): : 313 - 323
  • [27] Statistical Corpus and Language Comparison using Comparable Corpora
    Eckart, Thomas
    Quasthoff, Uwe
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 15 - 20
  • [28] Using specialized comparable corpora to evaluate student translations
    Pearson, J
    PALC'99: PRACTICAL APPLICATIONS IN LANGUAGE CORPORA, 2000, 1 : 541 - 552
  • [29] Word sense acquisition from bilingual comparable corpora
    Kaji, H
    HLT-NAACL 2003: HUMAN LANGUAGE TECHNOLOGY CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE MAIN CONFERENCE, 2003, : 111 - 118
  • [30] A Collection of Comparable Corpora for Under-resourced Languages
    Skadina, Inguna
    Aker, Ahmet
    Giouli, Voula
    Tufis, Dan
    Gaizauskas, Robert
    Mierina, Madara
    Mastropavlos, Nikos
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2010, 219 : 161 - 168