Effect of the Training Set on the Word Embeddings and Similarity Test Set for Turkish

被引:0
|
作者
Yucesoy, Veysel [1 ]
Koc, Aykut [1 ]
机构
[1] ASELSAN Arastirma Merkezi, Akilli Veri Analitigi Arastirma Program Mudurlugu, Ankara, Turkey
关键词
Word embeddings; natural language processing; classification; Turkish similarity test set;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Word embedding, which is usually used in the literature especially for English, is a technique to associate each word to a mathematical vector representation under which some structural or semantic relations hold. There are some Turkish application of this technique. Despite being designed according to English, it is also satisfactory for Turkish. In this study, the performance of Turkish word embeddings is analysed against the convenience of the data to the goal of the embedding. For this study, a new test set based on subject similarity in Turkish is introduced. This set is used to measure the performance of the word embeddings. This set will be publicly available for academic purposes(2). A subject classifier, which beats the state of the art performance, for Turkish labeled text corpus is also proposed.
引用
收藏
页码:1005 / 1008
页数:4
相关论文
共 50 条
  • [41] Research on approximation set of rough set based on fuzzy similarity
    Zhang, Qinghua
    Zhang, Pei
    Wang, Guoyin
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (03) : 2549 - 2562
  • [42] Leveraging set relations in exact and dynamic set similarity join
    Xubo Wang
    Lu Qin
    Xuemin Lin
    Ying Zhang
    Lijun Chang
    The VLDB Journal, 2019, 28 : 267 - 292
  • [43] Leveraging set relations in exact and dynamic set similarity join
    Wang, Xubo
    Qin, Lu
    Lin, Xuemin
    Zhang, Ying
    Chang, Lijun
    VLDB JOURNAL, 2019, 28 (02): : 267 - 292
  • [44] Set of texture similarity measures
    Carkacioglu, A
    YarmanVural, FT
    MACHINE VISION APPLICATIONS IN INDUSTRIAL INSPECTION V, 1997, 3029 : 118 - 127
  • [45] Similarity and nearness of the fuzzy set
    Tong, XJ
    Zhang, SM
    Zhou, L
    Huang, QM
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 2668 - 2670
  • [46] Similarity in soft set theory
    Min, Won Keun
    APPLIED MATHEMATICS LETTERS, 2012, 25 (03) : 310 - 314
  • [47] A Rank-Based Similarity Metric for Word Embeddings
    Santus, Enrico
    Wang, Hongmin
    Chersoni, Emmanuele
    Zhang, Yue
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2, 2018, : 552 - 557
  • [48] Short texts semantic similarity based on word embeddings
    Babic, Karlo
    Martincic-Ipsic, Sanda
    Mestrovic, Ana
    Guerra, Francesco
    CENTRAL EUROPEAN CONFERENCE ON INFORMATION AND INTELLIGENT SYSTEMS (CECIIS 2019), 2019, : 27 - 33
  • [49] SENSEMBED: Learning Sense Embeddings for Word and Relational Similarity
    Iacobacci, Ignacio
    Pilehvar, Mohammad Taher
    Navigli, Roberto
    PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1, 2015, : 95 - 105
  • [50] Binary Document Classification Based on Fast Flux Discriminant with Similarity Measure on Word Set
    Okubo, Keisuke
    Kumoi, Gendo
    Goto, Masayuki
    INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2019, 18 (02): : 245 - 251