Improving automated Turkish text classification with learning-based algorithms

被引:5
|
作者
Koksal, Omer [1 ]
Yilmaz, Eyup Halit [1 ]
机构
[1] ASELSAN Res Ctr, Ankara, Turkey
来源
关键词
machine learning; natural language processing; news categorization; pre-trained language models; text classification; PERFORMANCE; IMPACT;
D O I
10.1002/cpe.6874
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Text classification is the process of determining categories or tags of a document depending on its content. Although text classification is a well-known process, it has many steps that require tuning to improve mathematical models. This article provides a novel methodology and expresses key points to improve text classification performance using learning-based algorithms and techniques. First, to check the effectiveness of the proposed methodology, we selected two public Turkish news benchmarking datasets. Then, we performed extensive testing using both supervised machine learning algorithms and state-of-art pre-trained language models. The experimental results show that our methodology outperforms previous news classification studies on these benchmarking datasets improving categorization results based on F1-score. Therefore, we conclude that the presented methodology efficiently improves the classification results and selects the feasible classifier for a given dataset.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] Analytics of machine learning-based algorithms for text classification
    Hassan, Sayar Ul
    Ahamed, Jameel
    Ahmad, Khaleel
    [J]. Sustainable Operations and Computers, 2022, 3 : 238 - 248
  • [2] A Comparative Text Classification Study with Deep Learning-Based Algorithms
    Koksal, Omer
    Akgul, Ozlem
    [J]. 2022 9TH INTERNATIONAL CONFERENCE ON ELECTRICAL AND ELECTRONICS ENGINEERING (ICEEE 2022), 2022, : 387 - 391
  • [3] Machine Learning-Based Text Classification Comparison: Turkish Language Context
    Alzoubi, Yehia Ibrahim
    Topcu, Ahmet E.
    Erkaya, Ahmed Enis
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (16):
  • [4] A survey of automated data augmentation algorithms for deep learning-based image classification tasks
    Zihan Yang
    Richard O. Sinnott
    James Bailey
    Qiuhong Ke
    [J]. Knowledge and Information Systems, 2023, 65 : 2805 - 2861
  • [5] A survey of automated data augmentation algorithms for deep learning-based image classification tasks
    Yang, Zihan
    Sinnott, Richard O.
    Bailey, James
    Ke, Qiuhong
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2023, 65 (07) : 2805 - 2861
  • [6] The Evaluation of Word Embedding Models and Deep Learning Algorithms for Turkish Text Classification
    Kilimci, Zeynep Hilal
    Akyokus, Selim
    [J]. 2019 4TH INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2019, : 548 - 553
  • [7] Deep Learning-Based Algorithm for Classification of News Text
    Yu Li, Xiao
    Han, Ling Bo
    Feng Jiang, Zheng
    [J]. IEEE Access, 2024, 12 : 159086 - 159098
  • [8] Deep Learning-based Text Classification: A Comprehensive Review
    Minaee, Shervin
    Kalchbrenner, Nal
    Cambria, Erik
    Nikzad, Narjes
    Chenaghlu, Meysam
    Gao, Jianfeng
    [J]. ACM COMPUTING SURVEYS, 2022, 54 (03)
  • [9] Active Learning for Turkish Text Classification
    Sapci, Ali Osman Berk
    Tastan, Oznur
    Yeniterzi, Reyyan
    [J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
  • [10] Automated detection and classification of multi-cell Phytoliths using Deep Learning-Based Algorithms
    Berganzo-Besga, Iban
    Orengo, Hector A.
    Lumbreras, Felipe
    Aliende, Paloma
    Ramsey, Monica N.
    [J]. JOURNAL OF ARCHAEOLOGICAL SCIENCE, 2022, 148