A comprehensive analysis of static word embeddings for Turkish

被引:1
|
作者
Saritas, Karahan [1 ]
Oz, Cahid Arda [1 ]
Gungor, Tunga [1 ]
机构
[1] Bogazici Univ, Comp Engn Dept, TR-34342 Istanbul, Turkiye
关键词
Static word embeddings; Contextual word embeddings; Embedding models; Turkish;
D O I
10.1016/j.eswa.2024.124123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word embeddings are fixed -length, dense and distributed word representations that are used in natural language processing (NLP) applications. There are basically two types of word embedding models which are non -contextual (static) models and contextual models. The former method generates a single embedding for a word regardless of its context, while the latter method produces distinct embeddings for a word based on the specific contexts in which it appears. There are plenty of works that compare contextual and noncontextual embedding models within their respective groups in different languages. However, the number of studies that compare the models in these two groups with each other is very few and there is no such study in Turkish. This process necessitates converting contextual embeddings into static embeddings. In this paper, we compare and evaluate the performance of several contextual and non -contextual models in both intrinsic and extrinsic evaluation settings for Turkish. We make a fine-grained comparison by analyzing the syntactic and semantic capabilities of the models separately. The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks. We also build a Turkish word embedding repository comprising the embedding models used in this work, which may serve as a valuable resource for researchers and practitioners in the field of Turkish NLP. We make the word embeddings, scripts, and evaluation datasets publicly available.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Network embeddings from distributional thesauri for improving static word representations
    Jana, Abhik
    Haldar, Siddhant
    Goyal, Pawan
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 187
  • [22] The Role of Contextual Word Embeddings in Correcting the 'de/da' Clitic Errors in Turkish
    Ozturk, Hasan
    Degirmenci, Alperen
    Gungor, Onur
    Uskudarli, Suzan
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [23] Large Scale Intent Detection in Turkish Short Sentences with Contextual Word Embeddings
    Dundar, Enes Burak
    Kilic, Osman Fatih
    Cekic, Tolga
    Manav, Yusufcan
    Deniz, Onur
    PROCEEDINGS OF THE 12TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1, 2020, : 187 - 192
  • [24] A Comprehensive Comparison of Word Embeddings in Event & Entity Coreference Resolution.
    Poumay, Judicael
    Ittoo, Ashwin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 2755 - 2764
  • [25] Quality of Word Embeddings on Sentiment Analysis Tasks
    Cano, Erion
    Morisio, Maurizio
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2017, 2017, 10260 : 332 - 338
  • [26] Learning emotional word embeddings for sentiment analysis
    Zeng, Qingtian
    Zhao, Xishi
    Hu, Xiaohui
    Duan, Hua
    Zhao, Zhongying
    Li, Chao
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (05) : 9515 - 9527
  • [27] Sentiment analysis leveraging emotions and word embeddings
    Giatsoglou, Maria
    Vozalis, Manolis G.
    Diamantaras, Konstantinos
    Vakali, Athena
    Sarigiannidis, George
    Chatzisavvas, Konstantinos Ch.
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 69 : 214 - 224
  • [28] Improving semantic change analysis by combining word embeddings and word frequencies
    Englhardt, Adrian
    Willkomm, Jens
    Schaeler, Martin
    Boehm, Klemens
    INTERNATIONAL JOURNAL ON DIGITAL LIBRARIES, 2020, 21 (03) : 247 - 264
  • [29] Improving semantic change analysis by combining word embeddings and word frequencies
    Adrian Englhardt
    Jens Willkomm
    Martin Schäler
    Klemens Böhm
    International Journal on Digital Libraries, 2020, 21 : 247 - 264
  • [30] Distilling Contextual Embeddings Into A Static Word Embedding For Improving Hacker Forum Analytics
    Ampel, Benjamin
    Chen, Hsinchun
    2021 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), 2021, : 106 - 108