Effect of the Training Set on the Word Embeddings and Similarity Test Set for Turkish

被引:0
|
作者
Yucesoy, Veysel [1 ]
Koc, Aykut [1 ]
机构
[1] ASELSAN Arastirma Merkezi, Akilli Veri Analitigi Arastirma Program Mudurlugu, Ankara, Turkey
关键词
Word embeddings; natural language processing; classification; Turkish similarity test set;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Word embedding, which is usually used in the literature especially for English, is a technique to associate each word to a mathematical vector representation under which some structural or semantic relations hold. There are some Turkish application of this technique. Despite being designed according to English, it is also satisfactory for Turkish. In this study, the performance of Turkish word embeddings is analysed against the convenience of the data to the goal of the embedding. For this study, a new test set based on subject similarity in Turkish is introduced. This set is used to measure the performance of the word embeddings. This set will be publicly available for academic purposes(2). A subject classifier, which beats the state of the art performance, for Turkish labeled text corpus is also proposed.
引用
收藏
页码:1005 / 1008
页数:4
相关论文
共 50 条
  • [31] Faster Parallel Training of Word Embeddings
    Wszola, Eliza
    Jaggi, Martin
    Puschel, Markus
    2021 IEEE 28TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING, DATA, AND ANALYTICS (HIPC 2021), 2021, : 31 - 41
  • [32] Multilingual Training of Crosslingual Word Embeddings
    Duong, Long
    Kanayama, Hiroshi
    Ma, Tengfei
    Bird, Steven
    Cohn, Trevor
    15TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2017), VOL 1: LONG PAPERS, 2017, : 894 - 904
  • [33] Set-Word Embeddings and Semantic Indices: A New Contextual Model for Empirical Language Analysis
    de Cordoba, Pedro Fernandez
    Perez, Carlos A. Reyes
    Arnau, Claudia Sanchez
    Perez, Enrique A. Sanchez
    COMPUTERS, 2025, 14 (01)
  • [34] Similarity computation between fuzzy set and crisp set with similarity measure based on distance
    Lee, Sang H.
    Park, Hyunjeong
    Park, Wook Je
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 644 - +
  • [35] A survey on training and evaluation of word embeddings
    Torregrossa, Francois
    Allesiardo, Robin
    Claveau, Vincent
    Kooli, Nihel
    Gravier, Guillaume
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2021, 11 (02) : 85 - 103
  • [36] A survey on training and evaluation of word embeddings
    François Torregrossa
    Robin Allesiardo
    Vincent Claveau
    Nihel Kooli
    Guillaume Gravier
    International Journal of Data Science and Analytics, 2021, 11 : 85 - 103
  • [37] Training Temporal Word Embeddings with a Compass
    Di Carlo, Valerio
    Bianchi, Federico
    Palmonari, Matteo
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 6326 - 6334
  • [38] Sentiment Analysis in Turkish Based on Weighted Word Embeddings
    Onan, Aytug
    2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [39] Improving word embeddings projection for Turkish hypernym extraction
    Yildirim, Savas
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (06) : 4418 - 4428
  • [40] New Word Pair Level Embeddings to Improve Word Pair Similarity
    Shaukat, Asma
    Khan, Nazar
    2017 14TH IAPR INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2017), VOL 5, 2017, : 57 - 62