Revisiting comparable corpora in connected space

被引:0
|
作者
Zweigenbaum, Pierre [1 ]
机构
[1] CNRS, LIMSI, UPR 3251, F-91403 Orsay, France
关键词
D O I
暂无
中图分类号
H0 [语言学];
学科分类号
030303 ; 0501 ; 050102 ;
摘要
Bilingual lexicon extraction from comparable corpora is generally addressed through two monolingual distributional spaces of context vectors connected through a (partial) bilingual lexicon. We sketch here an abstract view of the task where these two spaces are embedded into one common bilingual space, and the two comparable corpora are merged into one bilingual corpus. We show how this paradigm accounts for a variety of models proposed so far, and where a set of topics addressed so far take place in this framework: degree of comparability, ambiguity in the bilingual lexicon, where parallel corpora stand with respect to this view, e.g., to replace the bilingual lexicon. A first experiment, using comparable corpora built from parallel corpora, illustrates one way to put this framework into practice. We also outline how this paradigm suggests directions for future investigations. We finally discuss the current limitations of the model and directions to solve them.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Looking for french-english translations in comparable medical corpora
    Chiao, YC
    Zweigenbaum, P
    AMIA 2002 SYMPOSIUM, PROCEEDINGS: BIOMEDICAL INFORMATICS: ONE DISCIPLINE, 2002, : 150 - 154
  • [42] French-English terminology extraction from comparable corpora
    Daille, B
    Morin, E
    NATURAL LANGUAGE PROCESSING - IJCNLP 2005, PROCEEDINGS, 2005, 3651 : 707 - 718
  • [43] Parallel sentence generation from comparable corpora for improved SMT
    Rauf, Sadaf Abdul
    Schwenk, Holger
    MACHINE TRANSLATION, 2011, 25 (04) : 341 - 375
  • [44] Creation of Comparable Corpora for English-{Urdu, Arabic, Persian}
    Abouammoh, Murad
    Shah, Kashif
    Aker, Ahmet
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4193 - 4196
  • [45] USE OF MONOLINGUAL AND COMPARABLE CORPORA IN THE CLASSROOM TO TRANSLATE ADVERBIAL CONNECTORS
    Sanchez Cardenas, Beatriz
    Faber, Pamela
    CADERNOS DE TRADUCAO, 2016, 36 (01): : 147 - 176
  • [46] Addressing polysemy in bilingual lexicon extraction from comparable corpora
    Fiser, Darja
    Ljubesic, Nikola
    Kubelka, Ozren
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3031 - 3035
  • [47] Entity Coherence in Comparable Learner Corpora: Seeking Pedagogical Insights
    Yamura-Takei, Mitsuko
    Fujiwara, Miho
    Yoshida, Etsuko
    PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2010, : 779 - 788
  • [48] Semantic Frame-Based Document Representation for Comparable Corpora
    Kim, Hyungsul
    Ren, Xiang
    Sun, Yizhou
    Wang, Chi
    Han, Jiawei
    2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 350 - 359
  • [49] Entity coherence in comparable learner corpora: Seeking pedagogical insights
    Yamura-Takei, Mitsuko
    Fujiwara, Miho
    Yoshida, Etsuko
    PACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, 2010, : 779 - 788
  • [50] Bilingual Contexts from Comparable Corpora to Mine for Translations of Collocations
    Taslimipoor, Shiva
    Mitkov, Ruslan
    Pastor, Gloria Corpas
    Fazly, Afsaneh
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, (CICLING 2016), PT II, 2018, 9624 : 115 - 126