A comparative study using vector space model with K-nearest neighbor on text categorization data

被引:0
|
作者
Hadi, Wa'el Musa [1 ]
Thabtah, Fadi [2 ]
Abdel-jaber, Hussein [3 ]
机构
[1] Arab Acad Banking & Financial Sci, Dept Comp Informat Syst, Amman, Jordan
[2] Philadelphia Univ, MIS Dept, Amman, Jordan
[3] Univ Bradford, Dept Comp, Bradford BD7 1DP, W Yorkshire, England
关键词
data mining; text categorization; term weighting; vector space model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization is one of the well studied of problems in data mining and information retrieval. Given a large quantity of documents in a data set where each document is associated with its corresponding category. Categorization involves building a model from classified documents, in order to classify previously unseen documents as accurately as possible. In this paper, we investigate variations of vector space model using inverse document frequency (IDF) and weighted inverse document frequency (WIDF). Experimental results against eight different data sets provide evidence that the Cosine Coefficient outperformed Jaccard and Dice Coefficient approaches with regards to F1 measure results, and the Cosine-based IDF achieved the highest average scores.
引用
收藏
页码:296 / +
页数:2
相关论文
共 50 条
  • [41] Supporting range queries on web data using k-nearest neighbor search
    Bae, Wan D.
    Alkobaisi, Shayma
    Kim, Seon Ho
    Narayanappa, Sada
    Shahabi, Cyrus
    [J]. WEB AND WIRELESS GEOGRAPHICAL INFORMATION SYSTEMS, PROCEEDINGS, 2007, 4857 : 61 - +
  • [42] Diagnosis of Arthritis Using K-Nearest Neighbor Approach
    Kaur, Rupinder
    Madaan, Vishu
    Agrawal, Prateek
    [J]. ADVANCED INFORMATICS FOR COMPUTING RESEARCH, PT I, 2019, 1075 : 160 - 171
  • [43] Formant Based Bangla Vowel Perceptual Space Classification Using Support Vector Machine and K-Nearest Neighbor Method
    Dey, Sourin
    Alam, Ashraful
    [J]. 2018 21ST INTERNATIONAL CONFERENCE OF COMPUTER AND INFORMATION TECHNOLOGY (ICCIT), 2018,
  • [44] A comparative study on thyroid disease detection using K-nearest neighbor and Naive Bayes classification techniques
    Khushboo Chandel
    Veenita Kunwar
    Sai Sabitha
    Tanupriya Choudhury
    Saurabh Mukherjee
    [J]. CSI Transactions on ICT, 2016, 4 (2-4) : 313 - 319
  • [45] K-nearest neighbor based structural twin support vector machine
    Pan, Xianli
    Luo, Yao
    Xu, Yitian
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 88 : 34 - 44
  • [46] Simulating climate change scenarios using an improved K-nearest neighbor model
    Sharif, Mohammed
    Burn, Donald H.
    [J]. JOURNAL OF HYDROLOGY, 2006, 325 (1-4) : 179 - 196
  • [47] Scalable Evidential K-Nearest Neighbor Classification on Big Data
    Gong, Chaoyu
    Demmel, Jim
    You, Yang
    [J]. IEEE TRANSACTIONS ON BIG DATA, 2024, 10 (03) : 226 - 237
  • [48] Development of a Crash Risk Prediction Model Using the k-Nearest Neighbor Algorithm
    Kang, Min Ji
    Kwon, Oh Hoon
    Park, Shin Hyoung
    [J]. ADVANCED MULTIMEDIA AND UBIQUITOUS ENGINEERING, MUE/FUTURETECH 2018, 2019, 518 : 835 - 840
  • [49] Consistency of the k-Nearest Neighbor Classifier for Spatially Dependent Data
    Younso, Ahmad
    Kanaya, Ziad
    Azhari, Nour
    [J]. COMMUNICATIONS IN MATHEMATICS AND STATISTICS, 2023, 11 (03) : 503 - 518
  • [50] A fuzzy K-nearest neighbor classifier to deal with imperfect data
    Jose M. Cadenas
    M. Carmen Garrido
    Raquel Martínez
    Enrique Muñoz
    Piero P. Bonissone
    [J]. Soft Computing, 2018, 22 : 3313 - 3330