Noisy Parallel Corpus Filtering through Projected Word Embeddings

被引:0
|
作者
Kurfali, Murathan [1 ]
Ostling, Robert [1 ]
机构
[1] Stockholm Univ, Dept Linguist, Stockholm, Sweden
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a very simple method for parallel text cleaning of low-resource languages, based on projection of word embeddings trained on large monolingual corpora in high-resource languages. In spite of its simplicity, we approach the strong baseline system in the downstream machine translation evaluation.
引用
收藏
页码:277 / 281
页数:5
相关论文
共 50 条
  • [21] Measuring associational thinking through word embeddings
    Carlos Periñán-Pascual
    [J]. Artificial Intelligence Review, 2022, 55 : 2065 - 2102
  • [22] Measuring associational thinking through word embeddings
    Perinan-Pascual, Carlos
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (03) : 2065 - 2102
  • [23] Word Alignment by Fine-tuning Embeddings on Parallel Corpora
    Dou, Zi-Yi
    Neubig, Graham
    [J]. 16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2112 - 2128
  • [24] NRC Parallel Corpus Filtering System for WMT 2019
    Bernier-Colborne, Gabriel
    Lo, Chi-kiu
    [J]. FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 252 - 260
  • [25] Parallel Corpus Filtering based on Fuzzy String Matching
    Sen, Sukanta
    Ekbal, Asif
    Bhattacharyya, Pushpak
    [J]. FOURTH CONFERENCE ON MACHINE TRANSLATION (WMT 2019), VOL 3: SHARED TASK PAPERS, DAY 2, 2019, : 289 - 293
  • [26] Margin-based Parallel Corpus Mining with Multilingual Sentence Embeddings
    Artetxe, Mikel
    Schwenk, Holger
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3197 - 3203
  • [27] Improving word embeddings in Portuguese: increasing accuracy while reducing the size of the corpus
    Pinto, Jose Pedro
    Viana, Paula
    Teixeira, Ines
    Andrade, Maria
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8
  • [28] Improving word embeddings in Portuguese: increasing accuracy while reducing the size of the corpus
    Pinto J.P.
    Viana P.
    Teixeira I.
    Andrade M.
    [J]. PeerJ Computer Science, 2022, 8
  • [29] Prepositional Polysemy through the lens of contextualized word embeddings
    Fonteyn, Lauren
    [J]. COGNITEXTES, 2021, 21
  • [30] Explaining Financial Uncertainty through Specialized Word Embeddings
    Theil, Christoph Kilian
    Štajner, Sanja
    Stuckenschmidt, Heiner
    [J]. ACM/IMS Transactions on Data Science, 2020, 1 (01):