Document Embedding based Supervised Methods for Turkish Text Classification

被引:0
|
作者
Celenli, Halil I. [1 ,2 ]
Ozturk, S. Talha [1 ]
Sahin, Gurkan [1 ,3 ]
Gerek, Aydin [1 ]
Ganiz, Murat C. [1 ]
机构
[1] Marmara Univ, Muhendislik Fak, Bilgisayar Muhendisligi, Istanbul, Turkey
[2] Kocaeli Univ, Muhendislik Fak, Bilgisayar Muhendisligi, Kocaeli, Turkey
[3] Yildiz Tekn Univ, Elekt Elekt Fak, Bilgisayar Muhendisligi, Istanbul, Turkey
关键词
Text Classification; Doc2Vec; Distributed Vector Representations; Embedding models; Paragraph Vectors;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Following the recent increase in the amount of available data, Deep Learning has become the most popular branch of Machine Learning. This trend can also be seen in Natural Language Processing (NLP) especially since textual data can now be scraped from in World Wide Web in vast quantities and used in an unsupervised or semi-supervised manner. For this reason, Deep Learning methods are being used more frequently. In this work we devise several classification methods based on the Paragraph Vector model (a.k.a. Doc2Vec) which represents documents as vectors. These include k-Nearest Neighborhood classifier (k-NN), Support Vector Machines (SVM), Centroid Classifier (CC) that works on paragraph vectors of documents and a custom made method which uses pairwise cosine similarities between documents and class centroids as features in Doc2Vec space. Our experiments use a number of representations and classifiers combined in various ways. On the representation side the Paragraph Vector model is compared with Term Frequency (tf) and Term Frequency-Inverse Document Frequency (tf-idf) using SVM, k-NN, CC and Centroid Features Support Vector Machine (CFSVM) as classifiers.
引用
收藏
页码:477 / 482
页数:6
相关论文
共 50 条
  • [1] A Supervised Local Linear Embedding Based SVM Text Classification Algorithm
    Li Youwen
    Xia Shixiong
    Zhou Yong
    [J]. 2009 SIXTH WEB INFORMATION SYSTEMS AND APPLICATIONS CONFERENCE, PROCEEDINGS, 2009, : 21 - 26
  • [2] A Rule-Based Approach to Embedding Techniques for Text Document Classification
    Aubaid, Asmaa M.
    Mishra, Alok
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (11):
  • [3] Word embedding and text classification based on deep learning methods
    Li, Saihan
    Gong, Bing
    [J]. 2020 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE COMMUNICATION AND NETWORK SECURITY (CSCNS2020), 2021, 336
  • [4] Knowledge-based Document Embedding for Cross-Domain Text Classification
    Li, Yiming
    Wei, Baogang
    Yao, Liang
    Chen, Hui
    Li, Zherong
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1395 - 1402
  • [5] The Problems and Methods of Automatic Text Document Classification
    Yatsko, V. A.
    [J]. AUTOMATIC DOCUMENTATION AND MATHEMATICAL LINGUISTICS, 2021, 55 (06) : 274 - 285
  • [6] The Problems and Methods of Automatic Text Document Classification
    V. A. Yatsko
    [J]. Automatic Documentation and Mathematical Linguistics, 2021, 55 : 274 - 285
  • [7] Relational Turkish Text Classification Using Distant Supervised Entities and Relations
    Okur, Halil Ibrahim
    Tohma, Kadir
    Sertbas, Ahmet
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 79 (02): : 2209 - 2228
  • [8] Document Sentiment Classification based on the Word Embedding
    Yin, Yanping
    Jin, Zhong
    [J]. PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON MECHATRONICS, MATERIALS, CHEMISTRY AND COMPUTER ENGINEERING 2015 (ICMMCCE 2015), 2015, 39 : 456 - 461
  • [9] Protein classification based on text document classification techniques
    Cheng, BYM
    Carbonell, JG
    Klein-Seetharaman, J
    [J]. PROTEINS-STRUCTURE FUNCTION AND BIOINFORMATICS, 2005, 58 (04) : 955 - 970
  • [10] The Evaluation of Word Embedding Models and Deep Learning Algorithms for Turkish Text Classification
    Kilimci, Zeynep Hilal
    Akyokus, Selim
    [J]. 2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 548 - 553