A comparative study using vector space model with K-nearest neighbor on text categorization data

被引:0
|
作者
Hadi, Wa'el Musa [1 ]
Thabtah, Fadi [2 ]
Abdel-jaber, Hussein [3 ]
机构
[1] Arab Acad Banking & Financial Sci, Dept Comp Informat Syst, Amman, Jordan
[2] Philadelphia Univ, MIS Dept, Amman, Jordan
[3] Univ Bradford, Dept Comp, Bradford BD7 1DP, W Yorkshire, England
关键词
data mining; text categorization; term weighting; vector space model;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text categorization is one of the well studied of problems in data mining and information retrieval. Given a large quantity of documents in a data set where each document is associated with its corresponding category. Categorization involves building a model from classified documents, in order to classify previously unseen documents as accurately as possible. In this paper, we investigate variations of vector space model using inverse document frequency (IDF) and weighted inverse document frequency (WIDF). Experimental results against eight different data sets provide evidence that the Cosine Coefficient outperformed Jaccard and Dice Coefficient approaches with regards to F1 measure results, and the Cosine-based IDF achieved the highest average scores.
引用
收藏
页码:296 / +
页数:2
相关论文
共 50 条
  • [31] Photoplethysmography Biometric Recognition Model Based on Sparse Softmax Vector and k-Nearest Neighbor
    Yang, Junfeng
    Huang, Yuwen
    Huang, Fuxian
    Yang, Gongping
    [J]. JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, 2020, 2020
  • [32] Heart Disease Prediction Using k-Nearest Neighbor Classifier Based on Handwritten Text
    Kedar, Seema
    Bormane, D. S.
    Nair, Vaishnavi
    [J]. COMPUTATIONAL INTELLIGENCE IN DATA MINING, VOL 1, CIDM 2015, 2016, 410 : 49 - 56
  • [33] Model-calibrated k-nearest neighbor estimators
    Magnussen, Steen
    Tomppo, Erkki
    [J]. SCANDINAVIAN JOURNAL OF FOREST RESEARCH, 2016, 31 (02) : 183 - 193
  • [34] Improved K-nearest neighbor weather generating model
    Sharif, Mohammed
    Burn, Donald H.
    [J]. JOURNAL OF HYDROLOGIC ENGINEERING, 2007, 12 (01) : 42 - 51
  • [35] Combining multiple k-nearest neighbor classifiers for text classification by reducts
    Bao, YG
    Ishii, N
    [J]. DISCOVERY SCIENCE, PROCEEDINGS, 2002, 2534 : 340 - 347
  • [37] The k-Nearest Neighbor Algorithm Using MapReduce Paradigm
    Anchalia, Prajesh P.
    Roy, Kaushik
    [J]. PROCEEDINGS FIFTH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS, MODELLING AND SIMULATION, 2014, : 513 - 518
  • [38] k-Nearest Neighbor Classification Using Dissimilarity Increments
    Aidos, Helena
    Fred, Ana
    [J]. IMAGE ANALYSIS AND RECOGNITION, PT I, 2012, 7324 : 27 - 33
  • [39] Literature Study on k-Nearest Neighbor query processing
    Anuja, K., V
    Mani, Shinu Acca
    [J]. 2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [40] Noisy data elimination using mutual k-nearest neighbor for classification mining
    Liu, Huawen
    Zhang, Shichao
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2012, 85 (05) : 1067 - 1074