Machine Learning-Based Text Classification Comparison: Turkish Language Context

被引:7
|
作者
Alzoubi, Yehia Ibrahim [1 ]
Topcu, Ahmet E. [2 ]
Erkaya, Ahmed Enis [3 ]
机构
[1] Amer Univ Middle East, Coll Business Adm, Egaila 54200, Kuwait
[2] Amer Univ Middle East, Coll Engn & Technol, Egaila 54200, Kuwait
[3] TUBITAK BILGEM Software Technol, Res Inst YTE, Ankara, Turkiye
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 16期
关键词
Turkish texts; machine learning; text preprocessing; algorithm effectiveness;
D O I
10.3390/app13169428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The growth in textual data associated with the increased usage of online services and the simplicity of having access to these data has resulted in a rise in the number of text classification research papers. Text classification has a significant influence on several domains such as news categorization, the detection of spam content, and sentiment analysis. The classification of Turkish text is the focus of this work since only a few studies have been conducted in this context. We utilize data obtained from customers' inquiries that come to an institution to evaluate the proposed techniques. Classes are assigned to such inquiries specified in the institution's internal procedures. The Support Vector Machine, Naive Bayes, Long Term-Short Memory, Random Forest, and Logistic Regression algorithms were used to classify the data. The performance of the various techniques was then analyzed after and before data preparation, and the results were compared. The Long Term-Short Memory technique demonstrated superior effectiveness in terms of accuracy, achieving an 84% accuracy rate, surpassing the best accuracy record of traditional techniques, which was 78% accuracy for the Support Vector Machine technique. The techniques performed better once the number of categories in the dataset was reduced. Moreover, the findings show that data preparation and coherence between the classes' number and the number of training sets are significant variables influencing the techniques' performance. The findings of this study and the text classification technique utilized may be applied to data in dialects other than Turkish.
引用
收藏
页数:21
相关论文
共 50 条
  • [1] Improving automated Turkish text classification with learning-based algorithms
    Koksal, Omer
    Yilmaz, Eyup Halit
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (11):
  • [2] Analytics of machine learning-based algorithms for text classification
    Hassan, Sayar Ul
    Ahamed, Jameel
    Ahmad, Khaleel
    [J]. Sustainable Operations and Computers, 2022, 3 : 238 - 248
  • [3] Turkish Text Classification with Machine Learning and Transfer Learning
    Aydogan, Murat
    Karci, Ali
    [J]. 2019 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND DATA PROCESSING (IDAP 2019), 2019,
  • [4] An Efficient Machine Learning-based Text Summarization in the Malayalam Language
    Haroon, Rosna P.
    Abdul Gafur, M.
    Barakkath Nisha, U.
    [J]. KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (06): : 1778 - 1799
  • [5] Machine learning for Asian language text classification
    Peng, Fuchun
    Huang, Xiangji
    [J]. JOURNAL OF DOCUMENTATION, 2007, 63 (03) : 378 - 397
  • [6] RESEARCH ON THE TEXT CLASSIFICATION BASED ON NATURAL LANGUAGE PROCESSING AND MACHINE LEARNING
    Chen Keming
    Zheng Jianguo
    [J]. JOURNAL OF THE BALKAN TRIBOLOGICAL ASSOCIATION, 2016, 22 (03): : 2484 - 2494
  • [7] Text classification for azerbaijani language using machine learning
    Suleymanov, Umid
    Kalejahi, Behnam Kiani
    Amrahov, Elkhan
    Badirkhanli, Rashid
    [J]. Computer Systems Science and Engineering, 2020, 35 (06): : 467 - 475
  • [8] Text Classification for Azerbaijani Language Using Machine Learning
    Suleymanov, Umid
    Kalejahi, Behnam Kiani
    Amrahov, Elkhan
    Badirkhanli, Rashid
    [J]. COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2020, 35 (06): : 467 - 475
  • [9] DOMAIN SPECIFIC SYNTAX BASED APPROACH FOR TEXT CLASSIFICATION IN MACHINE LEARNING CONTEXT
    Mohasseb, Alaa
    Bader-El-Den, Mohamed
    Liu, Han
    Cocea, Mihaela
    [J]. PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 2, 2017, : 652 - 657
  • [10] Machine learning-based guilt detection in text
    Meque, Abdul Gafar Manuel
    Hussain, Nisar
    Sidorov, Grigori
    Gelbukh, Alexander
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)