Effect of the Training Set on the Word Embeddings and Similarity Test Set for Turkish

被引：0

作者：

Yucesoy, Veysel ^{[1
]}

Koc, Aykut ^{[1
]}

机构：

[1] ASELSAN Arastirma Merkezi, Akilli Veri Analitigi Arastirma Program Mudurlugu, Ankara, Turkey

来源：

2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU) | 2016年

关键词：

Word embeddings; natural language processing; classification; Turkish similarity test set;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Word embedding, which is usually used in the literature especially for English, is a technique to associate each word to a mathematical vector representation under which some structural or semantic relations hold. There are some Turkish application of this technique. Despite being designed according to English, it is also satisfactory for Turkish. In this study, the performance of Turkish word embeddings is analysed against the convenience of the data to the goal of the embedding. For this study, a new test set based on subject similarity in Turkish is introduced. This set is used to measure the performance of the word embeddings. This set will be publicly available for academic purposes(2). A subject classifier, which beats the state of the art performance, for Turkish labeled text corpus is also proposed.

引用

页码：1005 / 1008

页数：4

共 50 条

[31] Faster Parallel Training of Word Embeddings
Wszola, Eliza
Jaggi, Martin
Puschel, Markus
2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 31 - 41
[32] Multilingual Training of Crosslingual Word Embeddings
Duong, Long
Kanayama, Hiroshi
Ma, Tengfei
Bird, Steven
Cohn, Trevor
15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 894 - 904
[33] Set-Word Embeddings and Semantic Indices: A New Contextual Model for Empirical Language Analysis
de Cordoba, Pedro Fernandez
Perez, Carlos A. Reyes
Arnau, Claudia Sanchez
Perez, Enrique A. Sanchez
COMPUTERS, 2025, 14 (01)
[34] Similarity computation between fuzzy set and crisp set with similarity measure based on distance
Lee, Sang H.
Park, Hyunjeong
Park, Wook Je
INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 644 - +
[35] A survey on training and evaluation of word embeddings
Torregrossa, Francois
Allesiardo, Robin
Claveau, Vincent
Kooli, Nihel
Gravier, Guillaume
INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2021, 11 (02) : 85 - 103
[36] A survey on training and evaluation of word embeddings
François Torregrossa
Robin Allesiardo
Vincent Claveau
Nihel Kooli
Guillaume Gravier
International Journal of Data Science and Analytics, 2021, 11 : 85 - 103
[37] Training Temporal Word Embeddings with a Compass
Di Carlo, Valerio
Bianchi, Federico
Palmonari, Matteo
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6326 - 6334
[38] Sentiment Analysis in Turkish Based on Weighted Word Embeddings
Onan, Aytug
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
[39] Improving word embeddings projection for Turkish hypernym extraction
Yildirim, Savas
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4418 - 4428
[40] New Word Pair Level Embeddings to Improve Word Pair Similarity
Shaukat, Asma
Khan, Nazar
2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 5, 2017, : 57 - 62

← 1 2 3 4 5 →