A Novel Feature Selection Framework for Automatic Web Page Classification

被引:0
|
作者
J.Alamelu Mangai [1 ]
V.Santhosh Kumar [1 ]
S.Appavu alias Balamurugan [2 ]
机构
[1] Department of Computer Science,BITS Pilani,Dubai Campus,DIAC,Dubai 345055,UAE
[2] Department of Information Technology,Thiagarajar College of Engineering,Madurai 625015,India
关键词
Feature selection; web page classification; Ward s minimum variance; information gain; WebKB;
D O I
暂无
中图分类号
TP393.092 [];
学科分类号
080402 ;
摘要
The number of Internet users and the number of web pages being added to WWW increase dramatically every day.It is therefore required to automatically and e?ciently classify web pages into web directories.This helps the search engines to provide users with relevant and quick retrieval results.As web pages are represented by thousands of features,feature selection helps the web page classifiers to resolve this large scale dimensionality problem.This paper proposes a new feature selection method using Ward s minimum variance measure.This measure is first used to identify clusters of redundant features in a web page.In each cluster,the best representative features are retained and the others are eliminated.Removing such redundant features helps in minimizing the resource utilization during classification.The proposed method of feature selection is compared with other common feature selection methods.Experiments done on a benchmark data set,namely WebKB show that the proposed method performs better than most of the other feature selection methods in terms of reducing the number of features and the classifier modeling time.
引用
收藏
页码:442 / 448
页数:7
相关论文
共 50 条
  • [31] Improving Vietnamese Web Page Classification by Combining Hybrid Feature Selection and Label Propagation with Link Information
    Ngo Van Linh
    Nguyen Thi Kim Anh
    Cao Manh Dat
    [J]. CONTEXT-AWARE SYSTEMS AND APPLICATIONS, (ICCASA 2012), 2013, 109 : 324 - 334
  • [32] Block Based Web Page Feature Selection with Neural Network
    Jin, Yushan
    Liu, Ruikai
    He, Xingran
    Huang, Yongping
    [J]. ADVANCES IN COMPUTER SCIENCE, ENVIRONMENT, ECOINFORMATICS, AND EDUCATION, PT II, 2011, 215 : 222 - 229
  • [33] A New Approach of Feature Selection for Chinese Web Page Categorization
    Li, Cunhe
    Zhu, Lina
    Liu, Kangwei
    [J]. ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2008, 5370 : 386 - 395
  • [34] Feature optimization and hybrid classification for malicious web page detection
    Deng, Weiping
    Peng, Yan
    Yang, Fan
    Song, Jun
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (16):
  • [35] A Method of Web Page Classification Based on Feature Dimension Reduction
    Ren, Xun-yi
    Zhang, Dan
    [J]. 2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL MODELING, SIMULATION AND APPLIED MATHEMATICS (CMSAM 2016), 2016, : 252 - 256
  • [36] A new framework for automatic feature selection for tracking
    Zhang, Ming Z.
    Asari, Vijayan K.
    [J]. 2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 3109 - +
  • [37] Feature Selection for Automatic Breast Density Classification
    Mustra, Mario
    Grgic, Mislav
    Delac, Kresimir
    [J]. PROCEEDINGS ELMAR-2010, 2010, : 9 - 16
  • [38] Feature Selection in Automatic Music Genre Classification
    Silla, Carlos N., Jr.
    Koerich, Alessandro L.
    Kaestner, Celso A. A.
    [J]. ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 39 - +
  • [39] Automatic feature selection for classification of health data
    He, HX
    Jin, HD
    Chen, J
    [J]. AI 2005: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2005, 3809 : 910 - 913
  • [40] A Novel Subset Feature Selection Framework for Increasing the Classification Performance of SONAR Targets
    Potharaju, Sai Prasad
    Sreedevi, M.
    [J]. 6TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS, 2018, 125 : 902 - 909