Improving automated Turkish text classification with learning-based algorithms

被引:5
|
作者
Koksal, Omer [1 ]
Yilmaz, Eyup Halit [1 ]
机构
[1] ASELSAN Res Ctr, Ankara, Turkey
来源
关键词
machine learning; natural language processing; news categorization; pre-trained language models; text classification; PERFORMANCE; IMPACT;
D O I
10.1002/cpe.6874
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Text classification is the process of determining categories or tags of a document depending on its content. Although text classification is a well-known process, it has many steps that require tuning to improve mathematical models. This article provides a novel methodology and expresses key points to improve text classification performance using learning-based algorithms and techniques. First, to check the effectiveness of the proposed methodology, we selected two public Turkish news benchmarking datasets. Then, we performed extensive testing using both supervised machine learning algorithms and state-of-art pre-trained language models. The experimental results show that our methodology outperforms previous news classification studies on these benchmarking datasets improving categorization results based on F1-score. Therefore, we conclude that the presented methodology efficiently improves the classification results and selects the feasible classifier for a given dataset.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Transfer learning-based English translation text classification in a multimedia network environment
    Zheng, Danyang
    [J]. PEERJ COMPUTER SCIENCE, 2024, 10
  • [32] Big Data Landscapes: Improving the Visualization of Machine Learning-based Clustering Algorithms
    Kammer, Dietrich
    Keck, Mandy
    Gruender, Thomas
    Groh, Rainer
    [J]. AVI'18: PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON ADVANCED VISUAL INTERFACES, 2018,
  • [33] Learning-based transformation for text documents
    Ma, LP
    Shepherd, J
    Wong, RK
    [J]. 6TH WORLD MULTICONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL XVIII, PROCEEDINGS: INFORMATION SYSTEMS, CONCEPTS AND APPLICATIONS OF SYSTEMICS, CYBERNETICS AND INFORMATICS, 2002, : 180 - 185
  • [34] Deep learning-based automated defect classification in Electroluminescence images of solar panels
    Al-Otum, Hazem Munawer
    [J]. ADVANCED ENGINEERING INFORMATICS, 2023, 58
  • [35] Deep learning-based automated disease detection and classification model for precision agriculture
    Pavithra, A.
    Kalpana, G.
    Vigneswaran, T.
    [J]. SOFT COMPUTING, 2023,
  • [36] Phishing Webpage Classification via Deep Learning-Based Algorithms: An Empirical Study
    Nguyet Quang Do
    Selamat, Ali
    Krejcar, Ondrej
    Yokoi, Takeru
    Fujita, Hamido
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (19):
  • [37] A Comprehensive Analysis on the Efficacy of Machine Learning-Based Algorithms for Breast Cancer Classification
    Senthilkumar, K. P.
    Narmatha, P.
    Narasimharao, Jonnadula
    Mustare, Narendra
    Rufus, N. Herald Anantha
    Singh, Yashapl
    [J]. JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (02) : 857 - 866
  • [38] On Sensitivity of Deep Learning Based Text Classification Algorithms to Practical Input Perturbations
    Miyajiwala, Aamir
    Ladkat, Arnav
    Jagadale, Samiksha
    Joshi, Raviraj
    [J]. INTELLIGENT COMPUTING, VOL 2, 2022, 507 : 613 - 626
  • [39] Improving Text Security Classification Towards an Automated Information Guard
    Heintz, Ilana
    Grothendieck, John
    Bernardin, Fred
    Kuperman, Gregory
    [J]. 2022 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2022,
  • [40] Emotional Text Analysis Based on Ensemble Learning of Three Different Classification Algorithms
    Bian, WenShuo
    Wang, ChunZhi
    Ye, ZhiWei
    Yan, Lingyu
    [J]. PROCEEDINGS OF THE 2019 10TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS - TECHNOLOGY AND APPLICATIONS (IDAACS), VOL. 2, 2019, : 938 - 941