Feature Extraction based Text Classification using K-Nearest Neighbor Algorithm

被引:0
|
作者
Azam, Muhammad [1 ]
Ahmed, Tanvir [1 ]
Sabah, Fahad [1 ]
Hussain, Muhammad Iftikhar [2 ,3 ]
机构
[1] Super Univ Lahore, Dept Comp Sci & Informat Technol, Lahore, Pakistan
[2] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[3] Beijing Univ Technol, Beijing Engn Res Ctr IoT Software & Syst, Beijing 100124, Peoples R China
关键词
K-NN; naive bayes; text classification; rapid miner; feature extraction;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific publications has been increasing enormously, with this increase classification of scientific publications is becoming challenging task. The core objective of this research is to analyze the performance of classification algorithms using Scopus dataset. In text classification, classification and feature extraction from the document using extracted features are the major issues for decreasing the performances in different algorithms. In this paper, performances of classification algorithms such as Naive Bayes (NB) and K-Nearest Neighbor (K-NN) shown better improvement using Bayesian boost and bagging. The performance results were analyzed through selected classification algorithms over 10K documents from Scopus examined using F-measure and produced comparison matrices to estimate accuracy, precision and recall using NB and KNN classifier. Further, data preprocessing and cleaning steps are induced on the selected dataset and class imbalance issues are analyzed to increase the performance of text classification algorithms. Experimental results showed performances over 7% using K-NN and revealed better as compared to NB.
引用
收藏
页码:95 / 101
页数:7
相关论文
共 50 条
  • [1] Novel text classification based on K-nearest neighbor
    Yu, Xiao-Peng
    Yu, Xiao-Gao
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 3425 - +
  • [2] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
    Okfalisa
    Mustakim
    Gazalba, Ikbal
    Reza, Nurul Gayatri Indah
    2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298
  • [4] A Review of a Text Classification Technique: K-Nearest Neighbor
    Zhou, R. S.
    Wang, Z. J.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL APPLICATIONS (CISIA 2015), 2015, 18 : 453 - 455
  • [5] K-Nearest Neighbor Algorithm Optimization in Text Categorization
    Chen, Shufeng
    2017 3RD INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND MATERIAL APPLICATION (ESMA2017), VOLS 1-4, 2018, 108
  • [6] Feature Based Classification of Nuclear Receptors and Their Subfamilies Using Fuzzy K-Nearest Neighbor
    Tiwari, Arvind Kumar
    Srivastava, Rajeev
    2015 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTER ENGINEERING AND APPLICATIONS (ICACEA), 2015, : 24 - 28
  • [7] A feature weighted K-nearest neighbor algorithm based on association rules
    Manzali Y.
    Barry K.A.
    Flouchi R.
    Balouki Y.
    Elfar M.
    Journal of Ambient Intelligence and Humanized Computing, 2024, 15 (07) : 2995 - 3008
  • [8] Protein kinase inhibitors’ classification using K-Nearest neighbor algorithm
    Arian, Roya
    Hariri, Amirali
    Mehridehnavi, Alireza
    Fassihi, Afshin
    Ghasemi, Fahimeh
    Computational Biology and Chemistry, 2020, 86
  • [9] An Improved K-Nearest Neighbor Algorithm for Pattern Classification
    Sultana, Zinnia
    Ferdousi, Ashifatul
    Tasnim, Farzana
    Nahar, Lutfun
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 760 - 767
  • [10] Arabic Text Classification Using K-Nearest Neighbour Algorithm
    Alhutaish, Roiss
    Omar, Nazlia
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2015, 12 (02) : 190 - 195