A comprehensive analysis of static word embeddings for Turkish

被引：1

作者：

Saritas, Karahan ^{[1
]}

Oz, Cahid Arda ^{[1
]}

Gungor, Tunga ^{[1
]}

机构：

[1] Bogazici Univ, Comp Engn Dept, TR-34342 Istanbul, Turkiye

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 252卷

关键词：

Static word embeddings; Contextual word embeddings; Embedding models; Turkish;

D O I：

10.1016/j.eswa.2024.124123

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Word embeddings are fixed -length, dense and distributed word representations that are used in natural language processing (NLP) applications. There are basically two types of word embedding models which are non -contextual (static) models and contextual models. The former method generates a single embedding for a word regardless of its context, while the latter method produces distinct embeddings for a word based on the specific contexts in which it appears. There are plenty of works that compare contextual and noncontextual embedding models within their respective groups in different languages. However, the number of studies that compare the models in these two groups with each other is very few and there is no such study in Turkish. This process necessitates converting contextual embeddings into static embeddings. In this paper, we compare and evaluate the performance of several contextual and non -contextual models in both intrinsic and extrinsic evaluation settings for Turkish. We make a fine-grained comparison by analyzing the syntactic and semantic capabilities of the models separately. The results of the analyses provide insights about the suitability of different embedding models in different types of NLP tasks. We also build a Turkish word embedding repository comprising the embedding models used in this work, which may serve as a valuable resource for researchers and practitioners in the field of Turkish NLP. We make the word embeddings, scripts, and evaluation datasets publicly available.

引用

页数：11

共 50 条

[1] Interpretability Analysis for Turkish Word Embeddings
Senel, Lutfi Kerem
Yucesoy, Veysel
Koc, Aykut
Cukur, Tolga
2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,
[2] Sentiment Analysis in Turkish Based on Weighted Word Embeddings
Onan, Aytug
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
[3] Turkish entity discovery with word embeddings
Kalender, Murat
Korkmaz, Emin Erkan
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2017, 25 (03) : 2388 - 2398
[4] Word Embeddings: A Comprehensive Survey
Pak, Alexandr
Ziyaden, Atabay
Saparov, Timur
Akhmetov, Iskander
Gelbukh, Alexander
COMPUTACION Y SISTEMAS, 2024, 28 (04): : 2005 - 2029
[5] Emotion-enriched word embeddings for Turkish
Uymaz, Hande Aka
Metin, Senem Kumova
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 225
[6] Investigation of Gender Bias in Turkish Word Embeddings
Sevim, Nurullah
Koc, Aykut
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[7] Learning Turkish Hypernymy Using Word Embeddings
Yildirim, Savas
Yildiz, Tugba
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2018, 11 (01) : 371 - 383
[8] Gender Bias in Word Embeddings: A Comprehensive Analysis of Frequency, Syntax, and Semantics
Caliskan, Aylin
Ajay, Pimparkar Parth
Charlesworth, Tessa
Wolfe, Robert
Banaji, Mahzarin R.
PROCEEDINGS OF THE 2022 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2022, 2022, : 156 - 170
[9] Improving word embeddings projection for Turkish hypernym extraction
Yildirim, Savas
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4418 - 4428
[10] Model Choices Influence Attributive Word Associations: A Semi-supervised Analysis of Static Word Embeddings
Bihani, Geetanjali
Rayz, Julia Taylor
2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020), 2020, : 568 - 573

← 1 2 3 4 5 →