A comprehensive analysis of static word embeddings for Turkish

被引:1
|
作者
Saritas, Karahan [1 ]
Oz, Cahid Arda [1 ]
Gungor, Tunga [1 ]
机构
[1] Bogazici Univ, Comp Engn Dept, TR-34342 Istanbul, Turkiye
关键词
Static word embeddings; Contextual word embeddings; Embedding models; Turkish;
D O I
10.1016/j.eswa.2024.124123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word embeddings are fixed -length, dense and distributed word representations that are used in natural language processing (NLP) applications. There are basically two types of word embedding models which are non -contextual (static) models and contextual models. The former method generates a single embedding for a word regardless of its context, while the latter method produces distinct embeddings for a word based on the specific contexts in which it appears. There are plenty of works that compare contextual and noncontextual embedding models within their respective groups in different languages. However, the number of studies that compare the models in these two groups with each other is very few and there is no such study in Turkish. This process necessitates converting contextual embeddings into static embeddings. In this paper, we compare and evaluate the performance of several contextual and non -contextual models in both intrinsic and extrinsic evaluation settings for Turkish. We make a fine-grained comparison by analyzing the syntactic and semantic capabilities of the models separately. The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks. We also build a Turkish word embedding repository comprising the embedding models used in this work, which may serve as a valuable resource for researchers and practitioners in the field of Turkish NLP. We make the word embeddings, scripts, and evaluation datasets publicly available.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Interpretability Analysis for Turkish Word Embeddings
    Senel, Lutfi Kerem
    Yucesoy, Veysel
    Koc, Aykut
    Cukur, Tolga
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
  • [2] Sentiment Analysis in Turkish Based on Weighted Word Embeddings
    Onan, Aytug
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [3] Turkish entity discovery with word embeddings
    Kalender, Murat
    Korkmaz, Emin Erkan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2017, 25 (03) : 2388 - 2398
  • [4] Word Embeddings: A Comprehensive Survey
    Pak, Alexandr
    Ziyaden, Atabay
    Saparov, Timur
    Akhmetov, Iskander
    Gelbukh, Alexander
    COMPUTACION Y SISTEMAS, 2024, 28 (04): : 2005 - 2029
  • [5] Emotion-enriched word embeddings for Turkish
    Uymaz, Hande Aka
    Metin, Senem Kumova
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 225
  • [6] Investigation of Gender Bias in Turkish Word Embeddings
    Sevim, Nurullah
    Koc, Aykut
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [7] Learning Turkish Hypernymy Using Word Embeddings
    Yildirim, Savas
    Yildiz, Tugba
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 371 - 383
  • [8] Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics
    Caliskan, Aylin
    Ajay, Pimparkar Parth
    Charlesworth, Tessa
    Wolfe, Robert
    Banaji, Mahzarin R.
    PROCEEDINGS OF THE 2022 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2022, 2022, : 156 - 170
  • [9] Improving word embeddings projection for Turkish hypernym extraction
    Yildirim, Savas
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4418 - 4428
  • [10] Model Choices Influence Attributive Word Associations: A Semi-supervised Analysis of Static Word Embeddings
    Bihani, Geetanjali
    Rayz, Julia Taylor
    2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020), 2020, : 568 - 573