Quantitative evaluation of web metrics for automatic genre classification of web pages

被引:4
|
作者
Malhotra R. [1 ]
Sharma A. [2 ]
机构
[1] Department of Computer Science, Delhi Technological University, Bawana Road, Delhi
[2] Department of Planning, Monitoring and Evaluation, CSIR-National Physical Laboratory, Dr K S Krishnan Marg, New Delhi
关键词
Entertainment websites; Machine learning; Web genre classification; Web metrics;
D O I
10.1007/s13198-017-0629-1
中图分类号
学科分类号
摘要
An additional dimension that facilitate a swift and relevant response from a web search engine is to introduce a genre class for each web page. The web genre classification distinguishes between pages by means of their features such as functionality, style, presentation layout, form and meta-content rather than on content. In this work, nineteen web metrics are identified according to the lexical, structural and functionality attributes of the web page rather than topic. The study is carried out to determine which of these attributes (lexical, structural and functionality) or its combinations, are significant for the development of web genre classification model. Also, we investigate the best web genre prediction model using parametric (Logistic Regression), non-parametric (Decision Tree) and ensemble (Bagging, Boosting) machine learning algorithms. We built forty-two genre classification models to classify web pages into Movie, TV or Music genre using a sample space data extracted from the Pixel Awards nominated and award winning websites. Our results obtained from the area under the curve analysis of these forty-two models show that the ensemble algorithms provide better performance. The rest of the models have acceptable performance, only in cases for which the lexical and structural attributes were fed in combination. Functionality metrics were found to considerably degrade the performance measure, irrespective of the algorithm used. The overall results of the study indicate the predictive capability of machine learning models for web genre classification, provided an appropriate choice is made on the selection of the input metrics. © 2017, The Society for Reliability Engineering, Quality and Operations Management (SREQOM), India and The Division of Operation and Maintenance, Lulea University of Technology, Sweden.
引用
收藏
页码:1567 / 1579
页数:12
相关论文
共 50 条
  • [1] Training the genre classifier for automatic classification of web pages
    Vidulin, Vedrana
    Lustrek, Mitja
    Gams, Matjaz
    [J]. PROCEEDINGS OF THE ITI 2007 29TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY INTERFACES, 2007, : 93 - +
  • [2] Automatic Classification of Uighur Web Pages
    Xu Guixian
    Gao Xu
    Zhao Xiaobing
    Yang Guosheng
    [J]. 2013 THIRD INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEM DESIGN AND ENGINEERING APPLICATIONS (ISDEA), 2013, : 390 - 393
  • [3] IEDs in the dark web: Genre classification of improvised explosive device web pages
    Chen, Hsinchun
    [J]. ISI 2008: 2008 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS, 2008, : 94 - +
  • [4] CLASSIFYING WEB PAGES BY GENRE
    Mason, Jane E.
    Shepherd, Michael
    Duffy, Jack
    [J]. WEBIST 2009: PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2009, : 651 - 658
  • [5] A Multi-label and Adaptive Genre Classification of Web Pages
    Jebari, Chaker
    Wani, M. Arif
    [J]. 2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 1, 2012, : 578 - 581
  • [6] Genre classification of web pages - User study and feasibility analysis
    Eissen, SMZ
    Stein, B
    [J]. KI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3238 : 256 - 269
  • [7] Ontology-based automatic classification for the web pages: Design, implementation and evaluation
    Prabowo, R
    Jackson, M
    Burden, P
    Knoell, HD
    [J]. WISE 2002: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, 2002, : 182 - 191
  • [8] Ontology-based automatic classification of web pages
    Song, Mu-Hee
    Lim, Soo-Yeon
    Park, Seong-Bae
    Kang, Dong-Jin
    Lee, Sang-Jo
    [J]. APPLIED SOFT COMPUTING TECHNOLOGIES: THE CHALLENGE OF COMPLEXITY, 2006, 34 : 483 - 493
  • [9] The automatic classification of web pages based on neural network
    Zhang, YZ
    Zhao, MS
    Wu, YS
    [J]. 8TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING, VOLS 1-3, PROCEEDING, 2001, : 570 - 575
  • [10] AutoWeb: Automatic Classification of Mobile Web Pages for Revisitation
    Liu, Jie
    Xu, Wenchang
    Shi, Yuanchun
    [J]. MOBILEHCI '12: COMPANION PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON HUMAN COMPUTER INTERACTION WITH MOBILE DEVICES AND SERVICES, 2012, : 153 - 153