Text categorization based on k-nearest neighbor approach for Web site classification

被引:79
|
作者
Kwon, OW [1 ]
Lee, JH [1 ]
机构
[1] Pohang Univ Sci & Technol, Dept Comp Sci & Engn, Div Elect & Comp Engn, Pohang 790784, South Korea
关键词
text categorization; Web site classification; Web page classification; k-nearest neighbor approach; machine learning;
D O I
10.1016/S0306-4573(02)00022-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach: It consists of three phases: Web page selection (connectivity analysis), Web page classification, and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only. (C) 2002 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:25 / 44
页数:20
相关论文
共 50 条
  • [41] A new belief-based K-nearest neighbor classification method
    Liu, Zhun-ga
    Pan, Quan
    Dezert, Jean
    [J]. PATTERN RECOGNITION, 2013, 46 (03) : 834 - 844
  • [42] A Localization Algorithm Based on Compressive Sensing by K-nearest Neighbor Classification
    Yang, Sixing
    Guo, Yan
    Liu, Xi
    Niu, Dawei
    Sun, Baoming
    [J]. PROCEEDINGS OF 2016 IEEE 13TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2016), 2016, : 863 - 867
  • [43] Quantum K-nearest neighbor classification algorithm based on Hamming distance
    Li, Jing
    Lin, Song
    Yu, Kai
    Guo, Gongde
    [J]. QUANTUM INFORMATION PROCESSING, 2022, 21 (01)
  • [44] Locality constrained representation-based K-nearest neighbor classification
    Gou, Jianping
    Qiu, Wenmo
    Yi, Zhang
    Shen, Xiangjun
    Zhan, Yongzhao
    Ou, Weihua
    [J]. KNOWLEDGE-BASED SYSTEMS, 2019, 167 : 38 - 52
  • [45] A sequential weighted k-nearest neighbor classification method
    Zhu, Ming-Han
    Luo, Da-Yong
    Yi, Li-Qun
    [J]. Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2009, 37 (11): : 2584 - 2588
  • [46] Infrared Face Recognition Based on Histogram and K-Nearest Neighbor Classification
    Wang, Shangfei
    Liu, Zhilei
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2010, PT 2, PROCEEDINGS, 2010, 6064 : 104 - 111
  • [47] IKNN: Informative K-nearest neighbor pattern classification
    Song, Yan
    Huang, Jian
    Zhou, Ding
    Zha, Hongyuan
    Giles, C. Lee
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2007, PROCEEDINGS, 2007, 4702 : 248 - +
  • [48] An Improved K-Nearest Neighbor Algorithm for Pattern Classification
    Sultana, Zinnia
    Ferdousi, Ashifatul
    Tasnim, Farzana
    Nahar, Lutfun
    [J]. INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (08) : 760 - 767
  • [49] Improving K-Nearest Neighbor Efficacy for FarsiText Classification
    Elahimanesh, Mohammad Hossein
    BehrouzMinaei-Bidgoli
    Malekinezhad, Hossein
    [J]. LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 1618 - 1621
  • [50] A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine
    Wan, Chin Heng
    Lee, Lam Hong
    Rajkumar, Rajprasad
    Isa, Dino
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (15) : 11880 - 11888