Machine Learning-Based Text Classification Comparison: Turkish Language Context

被引:7
|
作者
Alzoubi, Yehia Ibrahim [1 ]
Topcu, Ahmet E. [2 ]
Erkaya, Ahmed Enis [3 ]
机构
[1] Amer Univ Middle East, Coll Business Adm, Egaila 54200, Kuwait
[2] Amer Univ Middle East, Coll Engn & Technol, Egaila 54200, Kuwait
[3] TUBITAK BILGEM Software Technol, Res Inst YTE, Ankara, Turkiye
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 16期
关键词
Turkish texts; machine learning; text preprocessing; algorithm effectiveness;
D O I
10.3390/app13169428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The growth in textual data associated with the increased usage of online services and the simplicity of having access to these data has resulted in a rise in the number of text classification research papers. Text classification has a significant influence on several domains such as news categorization, the detection of spam content, and sentiment analysis. The classification of Turkish text is the focus of this work since only a few studies have been conducted in this context. We utilize data obtained from customers' inquiries that come to an institution to evaluate the proposed techniques. Classes are assigned to such inquiries specified in the institution's internal procedures. The Support Vector Machine, Naive Bayes, Long Term-Short Memory, Random Forest, and Logistic Regression algorithms were used to classify the data. The performance of the various techniques was then analyzed after and before data preparation, and the results were compared. The Long Term-Short Memory technique demonstrated superior effectiveness in terms of accuracy, achieving an 84% accuracy rate, surpassing the best accuracy record of traditional techniques, which was 78% accuracy for the Support Vector Machine technique. The techniques performed better once the number of categories in the dataset was reduced. Moreover, the findings show that data preparation and coherence between the classes' number and the number of training sets are significant variables influencing the techniques' performance. The findings of this study and the text classification technique utilized may be applied to data in dialects other than Turkish.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Machine Learning-based Classification of Online Industrial Datasets
    Faber, Rastislav
    L'ubusky, Karol
    Paulen, Radoslav
    [J]. 2023 24TH INTERNATIONAL CONFERENCE ON PROCESS CONTROL, PC, 2023, : 132 - 137
  • [42] A Comparative Text Classification Study with Deep Learning-Based Algorithms
    Koksal, Omer
    Akgul, Ozlem
    [J]. 2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 387 - 391
  • [43] A Deep Learning-Based Text Classification of Adverse Nursing Events
    Lu, Wenjing
    Jiang, Wei
    Zhang, Na
    Xue, Feng
    [J]. JOURNAL OF HEALTHCARE ENGINEERING, 2021, 2021
  • [44] The Effect of Transfer Learning on Turkish Text Classification
    Sahin, Gurkan
    Diri, Banu
    [J]. 29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [45] Deep Learning and Machine Learning-Based Model for Conversational Sentiment Classification
    Ullah, Sami
    Talib, Muhammad Ramzan
    Rana, Toqir A.
    Hanif, Muhammad Kashif
    Awais, Muhammad
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 72 (02): : 2323 - 2339
  • [46] Context encoding enables machine learning-based quantitative photoacoustics
    Kirchner, Thomas
    Groehl, Janek
    Maier-Hein, Lena
    [J]. JOURNAL OF BIOMEDICAL OPTICS, 2018, 23 (05)
  • [47] Stemming Text-based Web Page Classification using Machine Learning Algorithms: A Comparison
    Razali, Ansari
    Daud, Salwani Mohd
    Zin, Nor Azan Mat
    Shahidi, Faezehsadat
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 570 - 576
  • [48] Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach
    Weng, Wei-Hung
    Wagholikar, Kavishwar B.
    McCray, Alexa T.
    Szolovits, Peter
    Chueh, Henry C.
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2017, 17
  • [49] Medical subdomain classification of clinical notes using a machine learning-based natural language processing approach
    Wei-Hung Weng
    Kavishwar B. Wagholikar
    Alexa T. McCray
    Peter Szolovits
    Henry C. Chueh
    [J]. BMC Medical Informatics and Decision Making, 17
  • [50] Comparison of Text Sentiment Analysis based on Machine Learning
    Zhang, Xueying
    Zheng, Xianghan
    [J]. 2016 15TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2016, : 230 - 233