Web page classification based on heterogeneous features and a combination of multiple classifiers

被引:0
|
作者
Li Deng
Xin Du
Ji-zhong Shen
机构
[1] Zhejiang University,College of Information Science & Electronic Engineering
关键词
Web page classification; Web page features; Combined classifiers; TP391;
D O I
暂无
中图分类号
学科分类号
摘要
Precise web page classification can be achieved by evaluating features of web pages, and the structural features of web pages are effective complements to their textual features. Various classifiers have different characteristics, and multiple classifiers can be combined to allow classifiers to complement one another. In this study, a web page classification method based on heterogeneous features and a combination of multiple classifiers is proposed. Different from computing the frequency of HTML tags, we exploit the tree-like structure of HTML tags to characterize the structural features of a web page. Heterogeneous textual features and the proposed tree-like structural features are converted into vectors and fused. Confidence is proposed here as a criterion to compare the classification results of different classifiers by calculating the classification accuracy of a set of samples. Multiple classifiers are combined based on confidence with different decision strategies, such as voting, confidence comparison, and direct output, to give the final classification results. Experimental results demonstrate that on the Amazon dataset, 7-web-genres dataset, and DMOZ dataset, the accuracies are increased to 94.2%, 95.4%, and 95.7%, respectively. The fusion of the textual features with the proposed structural features is a comprehensive approach, and the accuracy is higher than that when using only textual features. At the same time, the accuracy of the web page classification is improved by combining multiple classifiers, and is higher than those of the related web page classification algorithms.
引用
收藏
页码:995 / 1004
页数:9
相关论文
共 50 条
  • [1] Web page classification based on heterogeneous features and a combination of multiple classifiers
    Deng, Li
    Du, Xin
    Shen, Ji-zhong
    [J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2020, 21 (07) : 995 - 1004
  • [2] Block classification of a web page by using a combination of multiple classifiers
    Kang, Jinbeom
    Choi, Joongmin
    [J]. NCM 2008: 4TH INTERNATIONAL CONFERENCE ON NETWORKED COMPUTING AND ADVANCED INFORMATION MANAGEMENT, VOL 2, PROCEEDINGS, 2008, : 290 - 295
  • [3] Combination of heterogeneous multiple classifiers based on evidence theory
    Han, De-Qiang
    Han, Chong-Zhao
    Yang, Yi
    [J]. 2007 INTERNATIONAL CONFERENCE ON WAVELET ANALYSIS AND PATTERN RECOGNITION, VOLS 1-4, PROCEEDINGS, 2007, : 573 - 578
  • [4] Web Page Element Classification Based on Visual Features
    Burget, Radek
    Rudolfova, Ivana
    [J]. 2009 FIRST ASIAN CONFERENCE ON INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2009, : 67 - 72
  • [5] Heterogeneous learner for web page classification
    Yu, HJ
    Chang, KCC
    Han, JW
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, : 538 - 545
  • [6] Web Page Classification: Features and Algorithms
    Qi, Xiaoguang
    Davison, Brian D.
    [J]. ACM COMPUTING SURVEYS, 2009, 41 (02)
  • [7] A Web Page Classification Method Based on TCP/IP Header Features
    Huang, Di
    Zhang, Xin-Yi
    Tang, Qi-Wei
    [J]. PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATION AND SENSOR NETWORKS (WCSN 2016), 2016, 44 : 61 - 64
  • [8] AdaBoost ranking results improvement by pairwise classifiers for web page classification
    GĄciarz, Tomasz
    Czajkowski, Krzysztof
    Niebylski, MacIej
    [J]. Advances in Intelligent and Soft Computing, 2011, 103 : 393 - 400
  • [9] Web page classification using an ensemble of support vector machine classifiers
    Zhong, Shaobo
    Zou, Dongsheng
    [J]. Journal of Networks, 2011, 6 (11) : 1625 - 1630
  • [10] Method for Classification of Remote Sensing Images Based on Multiple Classifiers Combination
    Shi, Lijun
    Mao, Xiancheng
    Peng, Zhenglin
    [J]. INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY, PTS 1-4, 2013, 263-266 : 2561 - 2565