Automatic Web Page Classification Using Various Features

被引:0
|
作者
Wen, Hao [1 ]
Fang, Liping [1 ]
Guan, Ling [2 ]
机构
[1] Ryerson Univ, Dept Mech & Ind Engn, Toronto, ON, Canada
[2] Ryerson Univ, Dept Elect & Comp Engn, Toronto, ON, Canada
关键词
Automatic classification; data fusion; ontology;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A model of automatically classifying uncertain Web pages using multiple features is presented. Since the traditional tree structure can barely classify an avalanche of new Web pages, the proposed approach partially uses the idea of "bag of words" incorporating the idea of classification fusion to describe and categorize Web pages. The proposed approach extracts features of Web pages from various perspectives, such as consulting a Web directory service, analyzing the text features of Web pages' titles and meta-search keywords, and identifying primary content of Web pages. Through fusing the results from these three dedicated classifiers, Web pages are classified to one or more categories with a bunch of words representing the Web pages. In order to demonstrate the effectiveness of the proposed method, experiments are carried out. In the experiments, the Web pages arc classified using the proposed fusion method to four categories. A comparison between the dedicated classifiers and fusion methods is also presented.
引用
收藏
页码:368 / +
页数:3
相关论文
共 50 条
  • [31] Automatic web page classification by combining feature selection techniques and lazy learners
    Devi, M. Indra
    Rajaram, R.
    Selvakuberan, K.
    ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL II, PROCEEDINGS, 2007, : 33 - 37
  • [32] Research and Implementation of Real-time Automatic Web Page Classification System
    Han, Weihong
    Zhu, Weihui
    Jia, Yan
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON MATERIAL, MECHANICAL AND MANUFACTURING ENGINEERING, 2015, 27 : 977 - 982
  • [33] Web page downloading and classification
    Tran, LQ
    Moon, CW
    Le, DX
    Thoma, GR
    FOURTEENTH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2001, : 321 - 326
  • [34] Web Page Genre Classification
    Chen, Guangyu
    Choi, Ben
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 2353 - 2357
  • [35] Web page genre classification
    Computer Science, Louisiana Tech University, LA 71272, United States
    Proc ACM Symp Appl Computing, (2353-2357):
  • [36] On Chinese web page classification
    Liang, JZ
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2004, 2004, 3070 : 634 - 639
  • [37] An Efficient Multiclass Classifier Using On-Page Positive Personality Features for Web Page Classification for the Next Generation Wireless Communication Networks
    Bhalla, Vinod Kumar
    Kumar, Neeraj
    WIRELESS PERSONAL COMMUNICATIONS, 2017, 93 (02) : 503 - 522
  • [38] An Efficient Multiclass Classifier Using On-Page Positive Personality Features for Web Page Classification for the Next Generation Wireless Communication Networks
    Vinod Kumar Bhalla
    Neeraj Kumar
    Wireless Personal Communications, 2017, 93 : 503 - 522
  • [39] Multiple sets of features for automatic genre classification of web documents
    Lim, CS
    Lee, KJ
    Kim, GC
    INFORMATION PROCESSING & MANAGEMENT, 2005, 41 (05) : 1263 - 1276
  • [40] Using linguistic features to automatically extract web page title
    Gali, Najlah
    Mariescu-Istodor, Radu
    Franti, Pasi
    EXPERT SYSTEMS WITH APPLICATIONS, 2017, 79 : 296 - 312