A Framework for the Construction of Monolingual and Cross-lingual Word Similarity Datasets

被引:0
|
作者
Camacho-Collados, Jose [1 ]
Pilehvar, Mohammad Taher [1 ]
Navigli, Roberto [1 ]
机构
[1] Sapienza Univ Rome, Dept Comp Sci, Rome, Italy
基金
欧洲研究理事会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite being one of the most popular tasks in lexical semantics, word similarity has often been limited to the English language. Other languages, even those that are widely spoken such as Spanish, do not have a reliable word similarity evaluation framework. We put forward robust methodologies for the extension of existing English datasets to other languages, both at monolingual and cross-lingual levels. We propose an automatic standardization for the construction of cross-lingual similarity datasets, and provide an evaluation, demonstrating its reliability and robustness. Based on our procedure and taking the RG-65 word similarity dataset as a reference, we release two high-quality Spanish and Farsi (Persian) monolingual datasets, and fifteen cross-lingual datasets for six languages: English, Spanish, French, German, Portuguese, and Farsi.
引用
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [1] On the Cross-lingual Transferability of Monolingual Representations
    Artetxe, Mikel
    Ruder, Sebastian
    Yogatama, Dani
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 4623 - 4637
  • [2] Cross-lingual Transfer of Monolingual Models
    Gogoulou, Evangelia
    Ekgren, Ariel
    Isbister, Tim
    Sahlgren, Magnus
    [J]. LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 948 - 955
  • [3] Generalized Tuning of Distributional Word Vectors for Monolingual and Cross-Lingual Lexical Entailment
    Glavas, Goran
    Vulic, Ivan
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4824 - 4830
  • [4] Extending Monolingual Semantic Textual Similarity Task to Multiple Cross-lingual Settings
    Hayashi, Yoshihiko
    Luo, Wentao
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1233 - 1239
  • [5] Monolingual and Cross-Lingual Information Retrieval Models Based on (Bilingual) Word Embeddings
    Vulic, Ivan
    Moens, Marie-Francine
    [J]. SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 363 - 372
  • [6] BERT for Monolingual and Cross-Lingual Reverse Dictionary
    Yan, Hang
    Li, Xiaonan
    Qiu, Xipeng
    Deng, Bocao
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4329 - 4338
  • [7] Cross-Lingual Document Similarity
    Muhic, Andrej
    Rupnik, Jan
    Skraba, Primoz
    [J]. PROCEEDINGS OF THE ITI 2012 34TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES (ITI), 2012, : 387 - 392
  • [8] Cross-Lingual Word Embeddings
    Søgaard, Anders
    Vulić, Ivan
    Ruder, Sebastian
    Faruqui, Manaal
    [J]. Synthesis Lectures on Human Language Technologies, 2019, 12 (02): : 1 - 132
  • [9] Cross-Lingual Word Embeddings
    Corro, Caio Filippo
    [J]. TRAITEMENT AUTOMATIQUE DES LANGUES, 2019, 60 (01): : 46 - 48
  • [10] Cross-Lingual Word Embeddings
    Agirre, Eneko
    [J]. COMPUTATIONAL LINGUISTICS, 2020, 46 (01) : 245 - 248