Text categorization based on k-nearest neighbor approach for Web site classification

被引:79
|
作者
Kwon, OW [1 ]
Lee, JH [1 ]
机构
[1] Pohang Univ Sci & Technol, Dept Comp Sci & Engn, Div Elect & Comp Engn, Pohang 790784, South Korea
关键词
text categorization; Web site classification; Web page classification; k-nearest neighbor approach; machine learning;
D O I
10.1016/S0306-4573(02)00022-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach: It consists of three phases: Web page selection (connectivity analysis), Web page classification, and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only. (C) 2002 Elsevier Science Ltd. All rights reserved.
引用
收藏
页码:25 / 44
页数:20
相关论文
共 50 条
  • [21] Customized Convolutional Neural Networks with K-Nearest Neighbor Classification System for Malware Categorization
    Komatwar, Rupali
    Kokare, Manesh
    [J]. JOURNAL OF APPLIED SECURITY RESEARCH, 2021, 16 (01) : 71 - 90
  • [22] Joint Evidential K-Nearest Neighbor Classification
    Gong, Chaoyu
    Li, Yongbin
    Liu, Yong
    Wang, Pei-hong
    You, Yang
    [J]. 2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022), 2022, : 2113 - 2126
  • [24] wSparse Coefficient-Based k-Nearest Neighbor Classification
    Ma, Hongxing
    Gou, Jianping
    Wang, Xili
    Ke, Jia
    Zeng, Shaoning
    [J]. IEEE ACCESS, 2017, 5 : 16618 - 16634
  • [25] A comparative study using vector space model with K-nearest neighbor on text categorization data
    Hadi, Wa'el Musa
    Thabtah, Fadi
    Abdel-jaber, Hussein
    [J]. WORLD CONGRESS ON ENGINEERING 2007, VOLS 1 AND 2, 2007, : 296 - +
  • [26] QoS Based Classification Using K-Nearest Neighbor Algorithm for Effective Web Service Selection
    Raj, Michael T. F.
    SivaPragasam, P.
    BalaKrishnan, R.
    Lalithambal, G.
    Ragasubha, S.
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
  • [27] Gene function classification using fuzzy K-Nearest Neighbor approach
    Li, Dan
    Deogun, Jitender S.
    Wang, Kefei
    [J]. GRC: 2007 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING, PROCEEDINGS, 2007, : 644 - +
  • [28] Neighbor-weighted K-nearest neighbor for unbalanced text corpus
    Tan, SB
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2005, 28 (04) : 667 - 671
  • [29] Ensemble learning approach in Improved K Nearest Neighbor algorithm for Text Categorization
    Iswarya, P.
    Radha, V.
    [J]. 2015 INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION, EMBEDDED AND COMMUNICATION SYSTEMS (ICIIECS), 2015,
  • [30] A k-nearest neighbor approach for functional regression
    Laloe, Thomas
    [J]. STATISTICS & PROBABILITY LETTERS, 2008, 78 (10) : 1189 - 1193