A Naive Bayes approach for URL classification with supervised feature selection and rejection framework

被引:25
|
作者
Rajalakshmi, R. [1 ]
Aravindan, Chandrabose [2 ]
机构
[1] Vellore Inst Technol, Sch Comp Sci & Engn, Madras, Tamil Nadu, India
[2] Sri Sivasubramaniya Nadar SSN Coll Engn, Dept Comp Sci & Engn, Madras, Tamil Nadu, India
关键词
feature selection; Naive Bayes classifier; rejection framework; URL classification;
D O I
10.1111/coin.12158
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Web page classification has become a challenging task due to the exponential growth of the World Wide Web. Uniform Resource Locator (URL)-based web page classification systems play an important role, but high accuracy may not be achievable as URL contains minimal information. Nevertheless, URL-based classifiers along with rejection framework can be used as a first-level filter in a multistage classifier, and a costlier feature extraction from contents may be done in later stages. However, noisy and irrelevant features present in URL demand feature selection methods for URL classification. Therefore, we propose a supervised feature selection method by which relevant URL features are identified using statistical methods. We propose a new feature weighting method for a Naive Bayes classifier by embedding the term goodness obtained from the feature selection method. We also propose a rejection framework to the Naive Bayes classifier by using posterior probability for determining the confidence score. The proposed method is evaluated on the Open Directory Project and WebKB data sets. Experimental results show that our method can be an effective first-level filter. McNemar tests confirm that our approach significantly improves the performance.
引用
收藏
页码:363 / 396
页数:34
相关论文
共 50 条
  • [1] Feature selection for text classification with Naive Bayes
    Chen, Jingnian
    Huang, Houkuan
    Tian, Shengfeng
    Qu, Youli
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 5432 - 5435
  • [2] Text Classification Based on Naive Bayes Algorithm with Feature Selection
    Chen, Zhenguo
    Shi, Guang
    Wang, Xiaoju
    [J]. INFORMATION-AN INTERNATIONAL INTERDISCIPLINARY JOURNAL, 2012, 15 (10): : 4255 - 4260
  • [3] Feature selection for multi-label naive Bayes classification
    Zhang, Min-Ling
    Pena, Jose M.
    Robles, Victor
    [J]. INFORMATION SCIENCES, 2009, 179 (19) : 3218 - 3229
  • [4] Feature subset selection using naive Bayes for text classification
    Feng, Guozhong
    Guo, Jianhua
    Jing, Bing-Yi
    Sun, Tieli
    [J]. PATTERN RECOGNITION LETTERS, 2015, 65 : 109 - 115
  • [5] Naive Feature Selection: Sparsity in Naive Bayes
    Askari, Armin
    d'Aspremont, Alex
    El Ghaoui, Laurent
    [J]. INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1813 - 1821
  • [6] HYBRID FEATURE SELECTION APPROACH USING BACTERIAL FORAGING ALGORITHM GUIDED BY NAIVE BAYES CLASSIFICATION
    Mittal, Divya
    Bala, Manju
    [J]. 2017 8TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2017,
  • [7] A New Feature Selection Approach to Naive Bayes Text Classifiers
    Zhang, Lungan
    Jiang, Liangxiao
    Li, Chaoqun
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2016, 30 (02)
  • [8] Fast Feature Selection for Naive Bayes Classification in Data Stream Mining
    Lutu, Patricia E. N.
    [J]. WORLD CONGRESS ON ENGINEERING - WCE 2013, VOL III, 2013, : 1549 - 1554
  • [9] Divergence-Based Feature Selection for Naive Bayes Text Classification
    Wang, Huizhen
    Zhu, Jingbo
    Su, Keh-Yih
    [J]. IEEE NLP-KE 2008: PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND KNOWLEDGE ENGINEERING, 2008, : 209 - +
  • [10] Variable selection for Naive Bayes classification
    Blanquero, Rafael
    Carrizosa, Emilio
    Ramirez-Cobo, Pepa
    Remedios Sillero-Denamiel, M.
    [J]. COMPUTERS & OPERATIONS RESEARCH, 2021, 135