Machine Learning Algorithms for Document Classification: Comparative Analysis

被引:0
|
作者
Rashid, Faizur [1 ]
Gargaare, Suleiman M. A. [2 ]
Aden, Abdulkadir H. [3 ]
Abdi, Afendi [4 ]
机构
[1] Haramaya Univ, Dept Comp Sci, Almaya, Ethiopia
[2] Univ Hargeisa, Dept Comp Sci, Hargeisa, Somalia
[3] Bule Hora Univ, Dept Comp Sci, Bule Hora, Ethiopia
[4] Haramaya Univ, Dept Software Engn, Almaya, Ethiopia
关键词
Document classification; machine learning algorithms; text classification; analysis;
D O I
10.14569/IJACSA.2022.0130430
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Automated document classification is the machine learning fundamental that refers to assigning automatic categories among scanned images of the documents. It reached the state-of-art stage but it needs to verify the performance and efficiency of the algorithm by comparing. The objective was to get the most efficient classification algorithms according to the usage of the fundamentals of science. Experimental methods were used by collecting data from a sum of 1080 students and researchers from Ethiopian universities and a meta-data set of Banknotes, Crowdsourced Mapping, and VxHeaven provided by UC Irvine. 25% of the respondents felt that KNN is better than the other models. The overall analysis of performance accuracies through various parameters namely accuracy percentage of 99.85%, the precision performance of 0.996, recall ratio of 100%, F-Score 0.997, classification time, and running time of KNN, SVM, Perceptron and Gaussian NB was observed. KNN performed better than the other classification algorithms with a fewer error rate of 0.0002 including the efficiency of the least classification time and running time with similar to 413 and 3.6978 microseconds consecutively. It is concluded by looking at all the parameters that KNN classifiers have been recognized as the best algorithm.
引用
收藏
页码:260 / 265
页数:6
相关论文
共 50 条
  • [31] Comparative Analysis of Machine Learning Algorithms for Rainfall Prediction
    Patil, Rudragoud
    Bedekar, Gayatri
    [J]. INNOVATIVE DATA COMMUNICATION TECHNOLOGIES AND APPLICATION, ICIDCA 2021, 2022, 96 : 833 - 842
  • [32] Comparative Study of Several Machine Learning Algorithms for Classification of Unifloral Honeys
    Mateo, Fernando
    Tarazona, Andrea
    Maria Mateo, Eva
    [J]. FOODS, 2021, 10 (07)
  • [33] A comparative survey of Machine Learning classification Algorithms for Breast Cancer Detection
    Kaklamanis, Markos Marios
    Filippakis, Michael E.
    [J]. PROCEEDINGS OF THE 23RD PAN-HELLENIC CONFERENCE OF INFORMATICS (PCI 2019), 2019, : 97 - 103
  • [34] Classification of the Insureds Using Integrated Machine Learning Algorithms: A Comparative Study
    Hanafy, Mohamed
    Ming, Ruixing
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2022, 36 (01)
  • [35] Book Genre Classification Based on Titles with Comparative Machine Learning Algorithms
    Ozsarfati, Eran
    Sahin, Egemen
    Saul, Can Jozef
    Yilmaz, Alper
    [J]. 2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), 2019, : 14 - 20
  • [36] A comparative study of machine learning and deep learning algorithms for padel tennis shot classification
    Dominguez, Guillermo Cartes
    Alvarez, Evelia Franco
    Cordoba, Alejandro Tapia
    Reina, Daniel Gutierrez
    [J]. SOFT COMPUTING, 2023, 27 (17) : 12367 - 12385
  • [37] A comparative study of machine learning and deep learning algorithms for padel tennis shot classification
    Guillermo Cartes Domínguez
    Evelia Franco Álvarez
    Alejandro Tapia Córdoba
    Daniel Gutiérrez Reina
    [J]. Soft Computing, 2023, 27 : 12367 - 12385
  • [38] Performance Analysis of Supervised Machine Learning Algorithms for Text Classification
    Mishu, Sadia Zaman
    Rafiuddin, S. M.
    [J]. PROCEEDINGS OF THE 2016 19TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2016, : 409 - 413
  • [39] Analysis of Machine Learning Algorithms for Classification and Prediction of Heart Disease
    Boyko, Nataliya
    Dosiak, Iryna
    [J]. IDDM 2021: INFORMATICS & DATA-DRIVEN MEDICINE: PROCEEDINGS OF THE 4TH INTERNATIONAL CONFERENCE ON INFORMATICS & DATA-DRIVEN MEDICINE (IDDM 2021), 2021, 3038 : 233 - 249
  • [40] Analysis and Classification of Android Malware using Machine Learning Algorithms
    Tarar, Neha
    Sharma, Shweta
    Krishna, C. Rama
    [J]. PROCEEDINGS OF THE 2018 3RD INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTATION TECHNOLOGIES (ICICT 2018), 2018, : 738 - 743