A comparative study using vector space model with K-nearest neighbor on text categorization data

被引:0
|
作者
Hadi, Wa'el Musa [1 ]
Thabtah, Fadi [2 ]
Abdel-jaber, Hussein [3 ]
机构
[1] Arab Acad Banking & Financial Sci, Dept Comp Informat Syst, Amman, Jordan
[2] Philadelphia Univ, MIS Dept, Amman, Jordan
[3] Univ Bradford, Dept Comp, Bradford BD7 1DP, W Yorkshire, England
关键词
data mining; text categorization; term weighting; vector space model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization is one of the well studied of problems in data mining and information retrieval. Given a large quantity of documents in a data set where each document is associated with its corresponding category. Categorization involves building a model from classified documents, in order to classify previously unseen documents as accurately as possible. In this paper, we investigate variations of vector space model using inverse document frequency (IDF) and weighted inverse document frequency (WIDF). Experimental results against eight different data sets provide evidence that the Cosine Coefficient outperformed Jaccard and Dice Coefficient approaches with regards to F1 measure results, and the Cosine-based IDF achieved the highest average scores.
引用
收藏
页码:296 / +
页数:2
相关论文
共 50 条
  • [1] Text Categorization with K-Nearest Neighbor Approach
    Manne, Suneetha
    Kotha, Sita Kumari
    Fatima, S. Sameen
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS 2012 (INDIA 2012), 2012, 132 : 413 - +
  • [2] Binary k-nearest neighbor for text categorization
    Tan, SB
    [J]. ONLINE INFORMATION REVIEW, 2005, 29 (04) : 391 - 399
  • [3] K-Nearest Neighbor Algorithm Optimization in Text Categorization
    Chen, Shufeng
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND MATERIAL APPLICATION (ESMA2017), VOLS 1-4, 2018, 108
  • [4] IMPROVING K-NEAREST NEIGHBOR EFFICIENCY FOR TEXT CATEGORIZATION
    Barigou, F.
    [J]. NEURAL NETWORK WORLD, 2016, 26 (01) : 45 - 65
  • [5] Comparative Analysis of K-Nearest Neighbor and Modified K-Nearest Neighbor Algorithm for Data Classification
    Okfalisa
    Mustakim
    Gazalba, Ikbal
    Reza, Nurul Gayatri Indah
    [J]. 2017 2ND INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE): OPPORTUNITIES AND CHALLENGES ON BIG DATA FUTURE INNOVATION, 2017, : 294 - 298
  • [6] Research on the Improvement of K-Nearest Neighbor Classifier for Imbalanced Text Categorization
    Yang Yanmei
    Xu Linying
    [J]. 2018 EIGHTH INTERNATIONAL CONFERENCE ON INSTRUMENTATION AND MEASUREMENT, COMPUTER, COMMUNICATION AND CONTROL (IMCCC 2018), 2018, : 968 - 972
  • [7] Application of k-Nearest Neighbor on feature projections classifier to text categorization
    Yavuz, T
    Guvenir, HA
    [J]. ADVANCES IN COMPUTER AND INFORMATION SCIENCES '98, 1998, 53 : 135 - 142
  • [8] Improving K Nearest Neighbor into String Vector Version for Text Categorization
    Jo, Taeho
    [J]. 2019 21ST INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): ICT FOR 4TH INDUSTRIAL REVOLUTION, 2019, : 1091 - 1097
  • [9] Modular k-nearest neighbor classification method for massively parallel text categorization
    Zhao, H
    Lu, BL
    [J]. COMPUTATIONAL AND INFORMATION SCIENCE, PROCEEDINGS, 2004, 3314 : 867 - 872
  • [10] Text categorization based on k-nearest neighbor approach for Web site classification
    Kwon, OW
    Lee, JH
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) : 25 - 44