Effect of the Training Set on the Word Embeddings and Similarity Test Set for Turkish

被引:0
|
作者
Yucesoy, Veysel [1 ]
Koc, Aykut [1 ]
机构
[1] ASELSAN Arastirma Merkezi, Akilli Veri Analitigi Arastirma Program Mudurlugu, Ankara, Turkey
关键词
Word embeddings; natural language processing; classification; Turkish similarity test set;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Word embedding, which is usually used in the literature especially for English, is a technique to associate each word to a mathematical vector representation under which some structural or semantic relations hold. There are some Turkish application of this technique. Despite being designed according to English, it is also satisfactory for Turkish. In this study, the performance of Turkish word embeddings is analysed against the convenience of the data to the goal of the embedding. For this study, a new test set based on subject similarity in Turkish is introduced. This set is used to measure the performance of the word embeddings. This set will be publicly available for academic purposes(2). A subject classifier, which beats the state of the art performance, for Turkish labeled text corpus is also proposed.
引用
收藏
页码:1005 / 1008
页数:4
相关论文
共 50 条
  • [1] CHARACTERIZATION OF SPILLED OILS BY PATTERN SIMILARITY - DISCRIMINATION RATIO OF TEST SET TO TRAINING SET
    HIGASHI, K
    HAGIWARA, K
    BUNSEKI KAGAKU, 1982, 31 (09) : 494 - 498
  • [2] Semantic Similarity between Turkish and European Languages Using Word Embeddings
    Senel, Lutfi Kerem
    Yucesoy, Veysel
    Koc, Aykut
    Cukur, Tolga
    2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [3] Performance of children on the Turkish Nonword Repetition Test: Effect of word similarity, word length, and scoring
    Topbas, Seyhun
    Kacar-Kutukcu, Dilber
    Kopkalli-Yavuz, Handan
    CLINICAL LINGUISTICS & PHONETICS, 2014, 28 (7-8) : 602 - 616
  • [4] A Semantic Set Theory for Word Semantic Similarity Assessment
    Wei, Yang
    Wei, Jinmao
    PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 2466 - 2471
  • [5] Model Similarity Mitigates Test Set Overuse
    Mania, Horia
    Miller, John
    Schmidt, Ludwig
    Hardt, Moritz
    Recht, Benjamin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [6] Data Set Generation for Analysing of Turkish Semantic and Sentence Similarity
    Ercan, Gokhan
    Erkek, Orcun
    Acikgoz, Onur
    Ozcelik, Riza
    Parlar, Selen
    Yildiz, Olcay Taner
    2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 347 - 351
  • [7] Effect of training and test set diversity on PLS statistics.
    Clark, RD
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2002, 224 : U484 - U484
  • [8] The Effect of Filling the Unspecified Values of a Test Set on the Test Set Quality
    Pomeranz, Irith
    Reddy, Sudhakar M.
    22ND INTERNATIONAL CONFERENCE ON VLSI DESIGN HELD JOINTLY WITH 8TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, PROCEEDINGS, 2009, : 215 - +
  • [9] Turkish entity discovery with word embeddings
    Kalender, Murat
    Korkmaz, Emin Erkan
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2017, 25 (03) : 2388 - 2398
  • [10] Interpretability Analysis for Turkish Word Embeddings
    Senel, Lutfi Kerem
    Yucesoy, Veysel
    Koc, Aykut
    Cukur, Tolga
    2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,