An Effective and Discriminative Feature Learning for URL based Web Page Classification

被引:5
|
作者
Rajalakshmi, R. [1 ]
Aravindan, Chandrabose [2 ]
机构
[1] Vellore Inst Technol, Sch Comp Sci & Engn, Chennai, Tamil Nadu, India
[2] SSN Coll Engn, Dept Comp Sci & Engn, Chennai, Tamil Nadu, India
关键词
D O I
10.1109/SMC.2018.00240
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Ever growing World Wide Web results in a large volume of web pages with variety of topics. Many applications such as information filtering and focused crawling demand large scale topic classification of a web page. To classify the web pages, URL based approach is proposed by which downloading the contents of the web page for classification purpose is avoided. In this paper, an automated way of learning category specific universal dictionary of discriminating URL features is proposed. Using this automatically learnt dictionary, the feature vector dimensionality is made independent of training set and it overcomes the difficulty of handling large scale data. For constructing this dictionary, publicly available ODP dataset have been used. The proposed approach was evaluated by applying the automatically learnt URL feature dictionaries on another dataset that contains search results from Google. Through experiments, it is shown that macro-average precision, recall and F1 values of 0.93, 0.85 and 0.88 have been achieved. We have observed that, the difference is not statistically significant when the universal dictionary is applied instead of using dataset-specific term dictionary.
引用
收藏
页码:1374 / 1379
页数:6
相关论文
共 50 条
  • [31] URL Classification based on Active Learning Approach
    Cyprienna, Rakotoasimbahoaka Antsa
    Yannick, Raharijaona Zo Lalaina
    Randria, Iadaloharivola
    Raft, Razafindrakoto Nicolas
    2021 3RD INTERNATIONAL CYBER RESILIENCE CONFERENCE (CRC), 2021, : 13 - 18
  • [32] A Novel Feature Selection Framework for Automatic Web Page Classification
    J.Alamelu Mangai
    V.Santhosh Kumar
    S.Appavu alias Balamurugan
    International Journal of Automation and Computing, 2012, (04) : 442 - 448
  • [33] A Novel Feature Selection Framework for Automatic Web Page Classification
    JAlamelu Mangai
    VSanthosh Kumar
    SAppavu alias Balamurugan
    International Journal of Automation & Computing , 2012, (04) : 442 - 448
  • [34] Feature optimization and hybrid classification for malicious web page detection
    Deng, Weiping
    Peng, Yan
    Yang, Fan
    Song, Jun
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (16):
  • [35] Web page feature selection and classification using neural networks
    Selamat, A
    Omatu, S
    INFORMATION SCIENCES, 2004, 158 : 69 - 88
  • [36] Two novel feature selection approaches for web page classification
    Chen, Chih-Ming
    Lee, Hahn-Ming
    Chang, Yu-Jung
    EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (01) : 260 - 272
  • [37] A Novel Feature Selection Framework for Automatic Web Page Classification
    Mangai, J. Alamelu
    Kumar, V. Santhosh
    Balamurugan, S. Appavu Alias
    INTERNATIONAL JOURNAL OF AUTOMATION AND COMPUTING, 2012, 9 (04) : 442 - 448
  • [38] Graph classification via discriminative edge feature learning
    Yi, Yang
    Lu, Xuequan
    Gao, Shang
    Robles-Kelly, Antonio
    Zhang, Yuejie
    PATTERN RECOGNITION, 2023, 143
  • [39] Hierarchical Discriminative Feature Learning for Hyperspectral Image Classification
    Zhang, Xiangrong
    Liang, Yunlong
    Zheng, Yaoguo
    An, Jinliang
    Jiao, L. C.
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2016, 13 (04) : 594 - 598
  • [40] Exemplar based Deep Discriminative and Shareable Feature Learning for scene image classification
    Zuo, Zhen
    Wang, Gang
    Shuai, Bing
    Zhao, Lifan
    Yang, Qingxiong
    PATTERN RECOGNITION, 2015, 48 (10) : 3004 - 3015