Effect of the Training Set on the Word Embeddings and Similarity Test Set for Turkish

被引:0
|
作者
Yucesoy, Veysel [1 ]
Koc, Aykut [1 ]
机构
[1] ASELSAN Arastirma Merkezi, Akilli Veri Analitigi Arastirma Program Mudurlugu, Ankara, Turkey
关键词
Word embeddings; natural language processing; classification; Turkish similarity test set;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Word embedding, which is usually used in the literature especially for English, is a technique to associate each word to a mathematical vector representation under which some structural or semantic relations hold. There are some Turkish application of this technique. Despite being designed according to English, it is also satisfactory for Turkish. In this study, the performance of Turkish word embeddings is analysed against the convenience of the data to the goal of the embedding. For this study, a new test set based on subject similarity in Turkish is introduced. This set is used to measure the performance of the word embeddings. This set will be publicly available for academic purposes(2). A subject classifier, which beats the state of the art performance, for Turkish labeled text corpus is also proposed.
引用
收藏
页码:1005 / 1008
页数:4
相关论文
共 50 条
  • [21] Calculating Requirements Similarity Using Word Embeddings
    Reddivari, Sandeep
    Wolbert, Jeffery
    2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 438 - 439
  • [22] EFFECT OF RESPONSE SET ON A TEST OF UNLEARNING
    EPSTEIN, ML
    PSYCHOLOGICAL REPORTS, 1973, 33 (02) : 439 - 445
  • [23] THE EFFECT OF SET ON MOSAIC TEST PERFORMANCE
    Horne, E. Porter
    Bliss, William
    JOURNAL OF GENERAL PSYCHOLOGY, 1955, 53 (02): : 329 - 333
  • [24] Leveraging Set Relations in Exact Set Similarity Join
    Wang, Xubo
    Qin, Lu
    Lin, Xuemin
    Zhang, Ying
    Chang, Lijun
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2017, 10 (09): : 925 - 936
  • [25] Training Set Similarity Based Parameter Selection for Statistical Machine Translation
    Shi, Xuewen
    Huang, Heyan
    Jian, Ping
    Tang, Yi-Kun
    WEB AND BIG DATA (APWEB-WAIM 2018), PT I, 2018, 10987 : 63 - 71
  • [26] Similarity to molecules in the training set is a good discriminator for prediction accuracy in QSAR
    Sheridan, RP
    Feuston, BP
    Maiorov, VN
    Kearsley, SK
    JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2004, 44 (06): : 1912 - 1928
  • [27] Similarity-based training set acquisition for continuous handwriting recognition
    Sas, Jerzy
    Markowska-Kaczmar, Urszula
    INFORMATION SCIENCES, 2012, 191 : 226 - 244
  • [28] Similarity-Based Training Set Recommendation for Software Defect Prediction
    Wang, Chao
    Yu, Qiao
    Han, Hui
    Computer Engineering and Applications, 2023, 59 (09) : 86 - 94
  • [29] WORD SUPERIORITY EFFECT WITH A RESTRICTED SET OF LETTER ALTERNATIVES
    SPECTOR, A
    PURCELL, DG
    FLANIGAN, H
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1976, 8 (04) : 264 - 264
  • [30] LIMITATIONS ON THE WORD SUPERIORITY EFFECT WITH A FIXED TARGET SET
    GREENBERG, SN
    KRUEGER, LE
    BULLETIN OF THE PSYCHONOMIC SOCIETY, 1980, 15 (01) : 25 - 28