An intelligent web-page classifier with fair feature-subset selection

被引:7
|
作者
Chen, Chih-Ming
Lee, Hahn-Ming
Tan, Chia-Chen
机构
[1] Natl Chengchi Univ, Grad Inst Lib Informat & Archival Studies, Taipei 116, Taiwan
[2] Natl Taiwan Univ Sci & Technol, Dept Comp Sci & Informat Engn, Taipei 106, Taiwan
关键词
feature selection; Web page classification; machine learning;
D O I
10.1016/j.engappai.2006.02.001
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The explosion of on-line information has given rise to many manually constructed topic hierarchies (such as Yahoo!!). But with the current growth rate in the amount of information, manual classification in topic hierarchies results in an immense information bottleneck. Therefore, developing an automatic classifier is an urgent need. However, classifiers suffer from enormous dimensionality, since the dimensionality is determined by the number of distinct keywords in a document corpus. More seriously, most classifiers are either working slowly or they are constructed subjectively without any learning ability. In this paper, we address these problems with a fair feature-subset selection (FFSS) algorithm and an adaptive fuzzy learning network (AFLN) for classification. The FFSS algorithm is used to reduce the enormous dimensionality. It not only gives fair treatment to each category but also has ability to identify useful features, including both positive and negative features. On the other hand, the AFLN provides extremely fast learning ability to model the uncertain behavior for classification so as to correct the fuzzy matrix automatically. Experimental results show that both FFSS algorithm and the AFLN lead to a significant improvement in document classification, compared to alternative approaches. (c) 2006 Elsevier Ltd. All rights reserved.
引用
收藏
页码:967 / 978
页数:12
相关论文
共 47 条
  • [1] An intelligent web-page classifier with fair feature-subset selection
    Lee, HM
    Chen, CM
    Tan, CC
    [J]. JOINT 9TH IFSA WORLD CONGRESS AND 20TH NAFIPS INTERNATIONAL CONFERENCE, PROCEEDINGS, VOLS. 1-5, 2001, : 395 - 400
  • [2] Turning Yahoo into an automatic Web-page classifier
    Mladenic, D
    [J]. ECAI 1998: 13TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 1998, : 473 - 474
  • [3] Rough set-aided feature selection for automatic Web-page classification
    Wakaki, T
    Itakura, H
    Tamura, M
    [J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 70 - 76
  • [4] EFFICIENT FEATURE-SUBSET SELECTION WITH PROBABILISTIC DISTANCE CRITERIA
    CHITTINENI, CB
    [J]. INFORMATION SCIENCES, 1980, 22 (01) : 19 - 35
  • [5] A study on rough set-aided feature selection for automatic Web-page classification
    Wakaki, Toshiko
    Itakura, Hiroyuki
    Tamura, Masaki
    Motoda, Hiroshi
    Washio, Takashi
    [J]. Web Intelligence and Agent Systems, 2006, 4 (04): : 431 - 441
  • [6] Feature subset selection using genetic algorithms for Web page categorization
    Ying, XM
    Liu, M
    Dou, WH
    [J]. COMPUTER SCIENCE AND TECHNOLOGY IN NEW CENTURY, 2001, : 548 - 552
  • [7] FEATURE-SUBSET SELECTION FOR STATISTICAL CLASSIFICATION PROBLEMS INVOLVING UNEQUAL COVARIANCE MATRICES
    YOUNG, DM
    ODELL, PL
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1986, 15 (01) : 137 - 157
  • [8] A Comparative Study of Feature-Ranking and Feature-Subset Selection Techniques for Improved Fault Prediction
    Rathore, Santosh Singh
    Gupta, Atul
    [J]. PROCEEDINGS OF THE 7TH INDIA SOFTWARE ENGINEERING CONFERENCE 2014, ISEC '14, 2014,
  • [9] Classifier and feature set ensembles for web page classification
    Onan, Aytug
    [J]. JOURNAL OF INFORMATION SCIENCE, 2016, 42 (02) : 150 - 165
  • [10] Study of feature selection in the web page classification
    [J]. 2000, Shanghai Comp Soc, China (26):