A Novel Feature Selection Framework for Automatic Web Page Classification

被引:0
|
作者
J.Alamelu Mangai [1 ]
V.Santhosh Kumar [1 ]
S.Appavu alias Balamurugan [2 ]
机构
[1] Department of Computer Science,BITS Pilani,Dubai Campus,DIAC,Dubai 345055,UAE
[2] Department of Information Technology,Thiagarajar College of Engineering,Madurai 625015,India
关键词
Feature selection; web page classification; Ward s minimum variance; information gain; WebKB;
D O I
暂无
中图分类号
TP393.092 [];
学科分类号
080402 ;
摘要
The number of Internet users and the number of web pages being added to WWW increase dramatically every day.It is therefore required to automatically and e?ciently classify web pages into web directories.This helps the search engines to provide users with relevant and quick retrieval results.As web pages are represented by thousands of features,feature selection helps the web page classifiers to resolve this large scale dimensionality problem.This paper proposes a new feature selection method using Ward s minimum variance measure.This measure is first used to identify clusters of redundant features in a web page.In each cluster,the best representative features are retained and the others are eliminated.Removing such redundant features helps in minimizing the resource utilization during classification.The proposed method of feature selection is compared with other common feature selection methods.Experiments done on a benchmark data set,namely WebKB show that the proposed method performs better than most of the other feature selection methods in terms of reducing the number of features and the classifier modeling time.
引用
收藏
页码:442 / 448
页数:7
相关论文
共 50 条
  • [1] A Novel Feature Selection Framework for Automatic Web Page Classification
    JAlamelu Mangai
    VSanthosh Kumar
    SAppavu alias Balamurugan
    [J]. International Journal of Automation & Computing . , 2012, (04) - 448
  • [2] A Novel Feature Selection Framework for Automatic Web Page Classification
    Mangai, J. Alamelu
    Kumar, V. Santhosh
    Balamurugan, S. Appavu Alias
    [J]. INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2012, 9 (04) : 442 - 448
  • [3] Two novel feature selection approaches for web page classification
    Chen, Chih-Ming
    Lee, Hahn-Ming
    Chang, Yu-Jung
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 260 - 272
  • [4] Automatic web page classification by combining feature selection techniques and lazy learners
    Devi, M. Indra
    Rajaram, R.
    Selvakuberan, K.
    [J]. ICCIMA 2007: INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, VOL II, PROCEEDINGS, 2007, : 33 - 37
  • [5] Study of feature selection in the web page classification
    [J]. 2000, Shanghai Comp Soc, China (26):
  • [6] Rough set-aided feature selection for automatic Web-page classification
    Wakaki, T
    Itakura, H
    Tamura, M
    [J]. IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2004), PROCEEDINGS, 2004, : 70 - 76
  • [7] Feature selection with rough sets for web page classification
    An, AJ
    Huang, YH
    Huang, XJ
    Cercone, N
    [J]. TRANSACTIONS ON ROUGH SETS II: ROUGH SETS AND FUZZY SETS, 2004, 3135 : 1 - 13
  • [8] A web page classification algorithm based on feature selection
    Zhou, Hongfang
    Guo, Jie
    Wang, Xinyi
    Duan, Wencong
    Wang, Peng
    Cao, Wenquan
    [J]. Journal of Information and Computational Science, 2015, 12 (04): : 1549 - 1556
  • [9] Web page feature selection and classification using neural networks
    Selamat, A
    Omatu, S
    [J]. INFORMATION SCIENCES, 2004, 158 : 69 - 88
  • [10] Automatic Web Page Classification
    Materna, Jiri
    [J]. RASLAN 2008: RECENT ADVANCES IN SLAVONIC NATURAL LANGUAGE PROCESSING: SECOND WORKSHOP, 2008, : 84 - 93