Multi-class Sports News Categorization using Machine Learning Techniques: Resource Creation and Evaluation

被引:4
|
作者
Barua, Adrita [1 ]
Sharif, Omar [1 ]
Hoque, Mohammed Moshiul [1 ]
机构
[1] Chittagong Univ Engn & Technol, Dept Comp Sci & Engn, Chattogram 4349, Bangladesh
关键词
Natural language processing; Text categorization; Bengali language processing; News classification; Machine learning;
D O I
10.1016/j.procs.2021.11.002
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The proliferation of the Internet and social media usage creates enormous textual data (specifically, news content) on the web. The most proportion of contents primarily are unstructured. Extracting meaningful insights from unstructured content is nearly impossible or extremely hard, and time-consuming by human labor. Thus, automatic text classification has gained much attention from NLP experts in recent years. Several techniques have been developed to classify news text in high resource languages (e.g., English, Chinese, French). However, the automatic classification of Bengali news text is in a primitive stage to date. This paper investigates the six most popular machine learning techniques (such as Logistic Regression (LR), Support Vector Classifier (SVC), Decision Tree (DT), Multinomial Naive Bayes (MNB), Random Forest (RF), etc.) with Term Frequency-Inverse Document Frequency (TF-IDF) features for automatic sports news classification in Bengali. Due to the unavailability of benchmark corpus, this work also developed a Bengali news corpus (called BNeC) consisting of 43306 news documents with 202830 unique words in multiple classes: Cricket, Football, Tennis, and Athletics. Experimental results on the test dataset show that the Support Vector Classifier (SVC) with unigram+bigram+trigram feature space obtained the highest weighted f(1)-score of 97.60% than the other classifiers and feature combinations. (C) 2021 The Authors. Published by Elsevier B.V.
引用
收藏
页码:112 / 121
页数:10
相关论文
共 50 条
  • [1] Multi-Class Network Anomaly Detection Using Machine Learning Techniques
    Gunupusala, Satyanarayana
    Kaila, Shahu Chatrapathi
    [J]. CONTEMPORARY MATHEMATICS, 2024, 5 (02): : 2335 - 2352
  • [2] Multi-Class Text Classification of Uzbek News Articles using Machine Learning
    Rabbimov, I. M.
    Kobilov, S. S.
    [J]. IV INTERNATIONAL SCIENTIFIC AND TECHNICAL CONFERENCE MECHANICAL SCIENCE AND TECHNOLOGY UPDATE (MSTU-2020), 2020, 1546
  • [3] Bearing Fault Classification Using Multi-Class Machine Learning (ML) Techniques
    Sujatha, C.
    Mohan, Aravind
    [J]. EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2024, 11 (01):
  • [4] Multi-Class Text Classification on Khmer News Using Ensemble Method in Machine Learning Algorithms
    Phann, Raksmey
    Soomlek, Chitsutha
    Seresangtakul, Pusadee
    [J]. ACTA INFORMATICA PRAGENSIA, 2023, 12 (02) : 243 - 259
  • [5] A Multi-class Classification Approach for Weather Forecasting with Machine Learning Techniques
    Dritsas, Elias
    Trigka, Maria
    Mylonas, Phivos
    [J]. 2022 17TH INTERNATIONAL WORKSHOP ON SEMANTIC AND SOCIAL MEDIA ADAPTATION & PERSONALIZATION (SMAP 2022), 2022, : 81 - 85
  • [6] Fuzzy support vector machine for multi-class text categorization
    Wang, Tai-Yue
    Chiang, Huei-Min
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2007, 43 (04) : 914 - 929
  • [7] Multi-class JPEG Steganalysis Using Extreme Learning Machine
    Bhasin, Veenu
    Bedi, Punam
    [J]. 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 1948 - 1952
  • [8] Multi-class multi-level classification algorithm for skin lesions classification using machine learning techniques
    Hameed, Nazia
    Shabut, Antesar M.
    Ghosh, Miltu K.
    Hossain, M. A.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 141
  • [9] Topological Deep Learning Model for Thyroid Multi-Class Categorization
    Priya, T. Selva Banu
    Rajabhushanam, C.
    Sriram, M.
    [J]. JOURNAL OF ELECTRICAL SYSTEMS, 2024, 20 (03) : 701 - 711
  • [10] Comparing Different Oversampling Methods in Predicting Multi-Class Educational Datasets Using Machine Learning Techniques
    Tariq, Muhammad Arham
    Sargano, Allah Bux
    Iftikhar, Muhammad Aksam
    Habib, Zulfiqar
    [J]. CYBERNETICS AND INFORMATION TECHNOLOGIES, 2023, 23 (04) : 199 - 212