Word Embedding Evaluation for Sinhala

被引:0
|
作者
Lakmal, Dimuthu [1 ]
Ranathunga, Surangika [1 ]
Peramuna, Saman [1 ]
Herath, Indu [1 ]
机构
[1] Univ Moratuwa, Dept Comp Sci & Engn, Katubedda 10400, Sri Lanka
关键词
Word Embedding; Sinhala; Evaluation Methodologies;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper presents the first ever comprehensive evaluation of different types of word embeddings for Sinhala language. Three standard word embedding models, namely, Word2Vec (both Skipgram and CBOW), FastText, and Glove are evaluated under two types of evaluation methods: intrinsic evaluation and extrinsic evaluation. Word analogy and word relatedness evaluations were performed in terms of intrinsic evaluation, while sentiment analysis and part-of-speech (POS) tagging were conducted as the extrinsic evaluation tasks. Benchmark datasets used for intrinsic evaluations were carefully crafted considering specific linguistic features of Sinhala. In general, FastText word embeddings with 300 dimensions reported the finest accuracies across all the evaluation tasks, while Glove reported the lowest results.
引用
收藏
页码:1874 / 1881
页数:8
相关论文
共 50 条
  • [1] Persian Word Embedding Evaluation Benchmarks
    Zahedi, Mohammad Sadegh
    Bokaei, Mohammad Hadi
    Shoele, Farzaneh
    Yadollahi, Mohammad Mahdi
    Doostmohammadi, Ehsan
    Farhoodi, Mojhgan
    [J]. 26TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE 2018), 2018, : 1583 - 1588
  • [2] A Word Sense Disambiguation Technique for Sinhala
    Arukgoda, Janindu
    Bandara, Vidudaya
    Bashani, Samiththa
    Gamage, Vijayindu
    Wimalasuriya, Daya
    [J]. PROCEEDINGS 2014 4TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE WITH APPLICATIONS IN ENGINEERING AND TECHNOLOGY ICAIET 2014, 2014, : 207 - 211
  • [3] Word Embedding Evaluation Datasets and Wikipedia Title Embedding for Chinese
    Chen, Chi-Yen
    Ma, Wei-Yun
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 825 - 831
  • [4] Cognitive predictors of word reading in Sinhala
    Wijaythilake, M. A. D. K.
    Parrila, R.
    Inoue, Tomohiro
    Nag, Sonali
    [J]. READING AND WRITING, 2019, 32 (07) : 1881 - 1907
  • [5] Cognitive predictors of word reading in Sinhala
    M. A. D. K. Wijaythilake
    R. Parrila
    Tomohiro Inoue
    Sonali Nag
    [J]. Reading and Writing, 2019, 32 : 1881 - 1907
  • [6] Intrinsic or Extrinsic Evaluation: An Overview of Word Embedding Evaluation
    Shi, Yong
    Zheng, Yuanchun
    Guo, Kun
    Zhu, Luyao
    Qu, Yi
    [J]. 2018 18TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2018, : 1255 - 1262
  • [7] Evaluation of Word Embedding via Domain Keywords
    Fu, Qunchao
    Li, Zongyang
    Han, Xu
    Wang, Cong
    [J]. PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 290 - 294
  • [8] Intrinsic Subspace Evaluation of Word Embedding Representations
    Yaghoobzadeh, Yadollah
    Schuetze, Hinrich
    [J]. PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, 2016, : 236 - 246
  • [9] Word Embedding Evaluation in Downstream Tasks and Semantic Analogies
    Santos, Joaquim
    Consoli, Bernardo
    Vieira, Renata
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4828 - 4834
  • [10] Sinhala Sentiment Lexicon Generation using Word Similarity
    Karunanayake, Binod
    Munasinghe, Udyogi
    Demotte, Piyumal
    Senevirathne, Lahiru
    Ranathunga, Surangika
    [J]. 2020 20TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER-2020), 2020, : 77 - 82