Effect of the Training Set on the Word Embeddings and Similarity Test Set for Turkish

被引：0

作者：

Yucesoy, Veysel ^{[1
]}

Koc, Aykut ^{[1
]}

机构：

[1] ASELSAN Arastirma Merkezi, Akilli Veri Analitigi Arastirma Program Mudurlugu, Ankara, Turkey

来源：

2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU) | 2016年

关键词：

Word embeddings; natural language processing; classification; Turkish similarity test set;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Word embedding, which is usually used in the literature especially for English, is a technique to associate each word to a mathematical vector representation under which some structural or semantic relations hold. There are some Turkish application of this technique. Despite being designed according to English, it is also satisfactory for Turkish. In this study, the performance of Turkish word embeddings is analysed against the convenience of the data to the goal of the embedding. For this study, a new test set based on subject similarity in Turkish is introduced. This set is used to measure the performance of the word embeddings. This set will be publicly available for academic purposes(2). A subject classifier, which beats the state of the art performance, for Turkish labeled text corpus is also proposed.

引用

页码：1005 / 1008

页数：4

共 50 条

[1] CHARACTERIZATION OF SPILLED OILS BY PATTERN SIMILARITY - DISCRIMINATION RATIO OF TEST SET TO TRAINING SET
HIGASHI, K
HAGIWARA, K
BUNSEKI KAGAKU, 1982, 31 (09) : 494 - 498
[2] Semantic Similarity between Turkish and European Languages Using Word Embeddings
Senel, Lutfi Kerem
Yucesoy, Veysel
Koc, Aykut
Cukur, Tolga
2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
[3] Performance of children on the Turkish Nonword Repetition Test: Effect of word similarity, word length, and scoring
Topbas, Seyhun
Kacar-Kutukcu, Dilber
Kopkalli-Yavuz, Handan
CLINICAL LINGUISTICS & PHONETICS, 2014, 28 (7-8) : 602 - 616
[4] A Semantic Set Theory for Word Semantic Similarity Assessment
Wei, Yang
Wei, Jinmao
PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 2466 - 2471
[5] Model Similarity Mitigates Test Set Overuse
Mania, Horia
Miller, John
Schmidt, Ludwig
Hardt, Moritz
Recht, Benjamin
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[6] Data Set Generation for Analysing of Turkish Semantic and Sentence Similarity
Ercan, Gokhan
Erkek, Orcun
Acikgoz, Onur
Ozcelik, Riza
Parlar, Selen
Yildiz, Olcay Taner
2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2018, : 347 - 351
[7] Effect of training and test set diversity on PLS statistics.
Clark, RD
ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2002, 224 : U484 - U484
[8] The Effect of Filling the Unspecified Values of a Test Set on the Test Set Quality
Pomeranz, Irith
Reddy, Sudhakar M.
22ND INTERNATIONAL CONFERENCE ON VLSI DESIGN HELD JOINTLY WITH 8TH INTERNATIONAL CONFERENCE ON EMBEDDED SYSTEMS, PROCEEDINGS, 2009, : 215 - +
[9] Turkish entity discovery with word embeddings
Kalender, Murat
Korkmaz, Emin Erkan
TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2017, 25 (03) : 2388 - 2398
[10] Interpretability Analysis for Turkish Word Embeddings
Senel, Lutfi Kerem
Yucesoy, Veysel
Koc, Aykut
Cukur, Tolga
2018 26TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2018,

← 1 2 3 4 5 →