A comprehensive analysis of static word embeddings for Turkish

被引:1
|
作者
Saritas, Karahan [1 ]
Oz, Cahid Arda [1 ]
Gungor, Tunga [1 ]
机构
[1] Bogazici Univ, Comp Engn Dept, TR-34342 Istanbul, Turkiye
关键词
Static word embeddings; Contextual word embeddings; Embedding models; Turkish;
D O I
10.1016/j.eswa.2024.124123
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Word embeddings are fixed -length, dense and distributed word representations that are used in natural language processing (NLP) applications. There are basically two types of word embedding models which are non -contextual (static) models and contextual models. The former method generates a single embedding for a word regardless of its context, while the latter method produces distinct embeddings for a word based on the specific contexts in which it appears. There are plenty of works that compare contextual and noncontextual embedding models within their respective groups in different languages. However, the number of studies that compare the models in these two groups with each other is very few and there is no such study in Turkish. This process necessitates converting contextual embeddings into static embeddings. In this paper, we compare and evaluate the performance of several contextual and non -contextual models in both intrinsic and extrinsic evaluation settings for Turkish. We make a fine-grained comparison by analyzing the syntactic and semantic capabilities of the models separately. The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks. We also build a Turkish word embedding repository comprising the embedding models used in this work, which may serve as a valuable resource for researchers and practitioners in the field of Turkish NLP. We make the word embeddings, scripts, and evaluation datasets publicly available.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Sentiment analysis with covariate-assisted word embeddings
    Xu, Shirong
    Dai, Ben
    Wang, Junhui
    ELECTRONIC JOURNAL OF STATISTICS, 2021, 15 (01): : 3015 - 3039
  • [32] Analysis of The Characteristics of Similar Words Computed by Word Embeddings
    Zhou, Shuhui
    Liu, Peihan
    Liu, Lizhen
    Song, Wei
    Cheng, Miaomiao
    PROCEEDINGS OF 2020 IEEE 10TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC 2020), 2020, : 327 - 330
  • [33] Analyzing Distances in Word Embeddings and Their Relation with Seme Analysis
    Gijon Agudo, Manuel
    Vilalta Arias, Armand
    Garcia-Gasulla, Dario
    ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2019, 319 : 407 - 416
  • [34] Multi-channel word embeddings for sentiment analysis
    Lin, Jhe-Wei
    Thanh, Tran Duy
    Chang, Rong-Guey
    SOFT COMPUTING, 2022, 26 (22) : 12703 - 12715
  • [35] Socialized Word Embeddings
    Zeng, Ziqian
    Yin, Yichun
    Song, Yangqiu
    Zhang, Ming
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3915 - 3921
  • [36] The lexical context in a style analysis: A word embeddings approach
    Kubat, Miroslav
    Hula, Jan
    Chen, Xinying
    Cech, Radek
    Milicka, Jiri
    CORPUS LINGUISTICS AND LINGUISTIC THEORY, 2021, 17 (02) : 443 - 464
  • [37] An analysis of hierarchical text classification using word embeddings
    Stein, Roger Alan
    Jaques, Patricia A.
    Valiati, Joao Francisco
    INFORMATION SCIENCES, 2019, 471 : 216 - 232
  • [38] Multi-channel word embeddings for sentiment analysis
    Jhe-Wei Lin
    Tran Duy Thanh
    Rong-Guey Chang
    Soft Computing, 2022, 26 : 12703 - 12715
  • [39] Refining Word Embeddings with Sentiment Information for Sentiment Analysis
    Kasri M.
    Birjali M.
    Nabil M.
    Beni-Hssane A.
    El-Ansari A.
    El Fissaoui M.
    Journal of ICT Standardization, 2022, 10 (03): : 353 - 382
  • [40] Word Embeddings for the Analysis of Ideological Placement in Parliamentary Corpora
    Rheault, Ludovic
    Cochrane, Christopher
    POLITICAL ANALYSIS, 2020, 28 (01) : 112 - 133