Hybrid Dimensionality Reduction Approach for Web Page Classification

被引:0
|
作者
Sarode, Shraddha [1 ]
Gadge, Jayant [2 ]
机构
[1] Thadomal Shahani Engn Coll, Comp Engn ME, Bombay, Maharashtra, India
[2] Thadomal Shahani Engn Coll, Dept Comp Engn, Bombay, Maharashtra, India
关键词
Dimensionality Reduction; Feature Selection; Information gain; Naive Bayes; Rough Set; Web Page Classification;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Today there is huge amount of data available on World Wide Web. One way to manage data is web page classification. One of the issues of web page classification considered in this paper is high dimensionality. Dimensionality refers to number of terms in a web page. High dimensionality of web pages causes problem while classifying them. The main objective of reducing dimensionality of web pages is to improve the performance of the classifier. This paper describes hybrid approach of dimensionality reduction for web page classification using a rough set and information gain method. Feature selection and dimensionality reduction methods are used to reduce the dimensionality of web pages. Information gain method is used as feature selection method. Rough set based Quick Reduct algorithm is used for dimensionality reduction. Web pages are classified using naive Bayesian method. Significant results are obtained and tested for proposed architecture.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Ensemble approach for web page classification
    Gupta, Amit
    Bhatia, Rajesh
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (16) : 25219 - 25240
  • [2] Ensemble approach for web page classification
    Amit Gupta
    Rajesh Bhatia
    [J]. Multimedia Tools and Applications, 2021, 80 : 25219 - 25240
  • [3] A hybrid neural network for web page classification
    Cao, YK
    Li, YF
    Yu, ZZ
    [J]. DIGITAL LIBRARIES: INTERNATIONAL COLLABORATION AND CROSS-FERTILIZATION, PROCEEDINGS, 2004, 3334 : 641 - 641
  • [4] MINIMAX APPROACH TO DIMENSIONALITY REDUCTION AND CLASSIFICATION
    YOUNG, TY
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1971, SMC1 (04): : 401 - &
  • [5] A novel approach for effective web page classification
    Mangai, J. Alamelu
    Kumar, V. Santhosh
    Balamurugan, S. Appavu
    [J]. INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2013, 5 (03) : 233 - 245
  • [6] Web page classification: A soft computing approach
    Ribeiro, A
    Fresno, V
    Garcia-Alegre, MC
    Guinea, D
    [J]. ADVANCES IN WEB INTELLIGENCE, 2003, 2663 : 103 - 112
  • [7] An approach to Web page classification based on granules
    Duan, Qiguo
    Miao, Duoqian
    Wang, Ruizhi
    Chen, Min
    [J]. PROCEEDINGS OF THE IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE: WI 2007, 2007, : 279 - 282
  • [8] A Novel Approach for Ontology-based Dimensionality Reduction for Web Text Document Classification
    Elhadad, Mohamed K.
    Badran, Khaled M.
    Salama, Gouda I.
    [J]. 2017 16TH IEEE/ACIS INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS 2017), 2017, : 373 - 378
  • [9] A hybrid approach for refreshing web page repositories
    Ghodsi, A
    Hassanzadeh, O
    Kamali, S
    Monemizadeh, A
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, PROCEEDINGS, 2005, 3453 : 588 - 593
  • [10] A Feature Clustering Approach for Dimensionality Reduction and Classification
    VinayKumar, Kotte
    Srinivasan, R.
    Singh, Elijah Blessing
    [J]. MENDEL 2015: RECENT ADVANCES IN SOFT COMPUTING, 2015, 378 : 257 - 268