Web page classification based on heterogeneous features and a combination of multiple classifiers

被引:0
|
作者
Li Deng
Xin Du
Ji-zhong Shen
机构
[1] Zhejiang University,College of Information Science & Electronic Engineering
关键词
Web page classification; Web page features; Combined classifiers; TP391;
D O I
暂无
中图分类号
学科分类号
摘要
Precise web page classification can be achieved by evaluating features of web pages, and the structural features of web pages are effective complements to their textual features. Various classifiers have different characteristics, and multiple classifiers can be combined to allow classifiers to complement one another. In this study, a web page classification method based on heterogeneous features and a combination of multiple classifiers is proposed. Different from computing the frequency of HTML tags, we exploit the tree-like structure of HTML tags to characterize the structural features of a web page. Heterogeneous textual features and the proposed tree-like structural features are converted into vectors and fused. Confidence is proposed here as a criterion to compare the classification results of different classifiers by calculating the classification accuracy of a set of samples. Multiple classifiers are combined based on confidence with different decision strategies, such as voting, confidence comparison, and direct output, to give the final classification results. Experimental results demonstrate that on the Amazon dataset, 7-web-genres dataset, and DMOZ dataset, the accuracies are increased to 94.2%, 95.4%, and 95.7%, respectively. The fusion of the textual features with the proposed structural features is a comprehensive approach, and the accuracy is higher than that when using only textual features. At the same time, the accuracy of the web page classification is improved by combining multiple classifiers, and is higher than those of the related web page classification algorithms.
引用
收藏
页码:995 / 1004
页数:9
相关论文
共 50 条
  • [21] SURF Features Based Classifiers for Mammogram Classification
    Deshmukh, Jyoti
    Bhosle, Udhav
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, SIGNAL PROCESSING AND NETWORKING (WISPNET), 2017, : 134 - 139
  • [22] Augmenting Web Page Classifiers with Social Annotations
    Zubiaga, Arkaitz
    Martinez, Raquel
    Fresno, Victor
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2011, (47): : 189 - 196
  • [23] A Web page classification system based on a genetic algorithm using tagged-terms as features
    Ozel, Selma Ayse
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (04) : 3407 - 3415
  • [24] Hyper-Classification Framework with Heterogeneous Web Features
    Orimaye, Sylvester O.
    Han, Lim Wern
    Alhashmi, Saadat M.
    Eu-Gene, Siew
    [J]. BUSINESS TRANSFORMATION THROUGH INNOVATION AND KNOWLEDGE MANAGEMENT: AN ACADEMIC PERSPECTIVE, VOLS 1-2, 2010, : 891 - 900
  • [25] Multiple classifiers combination based on Specialists' fields
    Jia, Pengtao
    He, Huacan
    Lin, Wei
    [J]. MICAI 2006: FIFTH MEXICAN INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2006, : 161 - +
  • [26] Multiple classifiers combination based on fuzzy integral
    Zhang, L
    Sun, G
    Guo, J
    [J]. ICCC2004: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION VOL 1AND 2, 2004, : 1659 - 1664
  • [27] Microfilariae Classification Using Multiple Classifiers for Color and Shape Features
    AL-Tam, Faroq
    dos Anjos, Antonio
    Pion, Sebastien
    Boussinesq, Michel
    Shahbazkia, Hamid Reza
    [J]. OPEN ENGINEERING, 2016, 6 (01): : 560 - 565
  • [28] Web Metrics based on Page Features and Visitor's Web Behavior
    Alagappan, Baskaran
    Alagappan, Murugappan
    Danishkumar, S.
    [J]. SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, VOL 2, PROCEEDINGS, 2009, : 236 - +
  • [29] A sentiment classification model based on multiple classifiers
    Catal, Cagatary
    Nangir, Mehmet
    [J]. APPLIED SOFT COMPUTING, 2017, 50 : 135 - 141
  • [30] Web page classification based on a simplified swarm optimization
    Lee, Ji-Hyun
    Yeh, Wei-Chang
    Chuang, Mei-Chi
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2015, 270 : 13 - 24