Web page classification based on a support vector machine using a weighted vote schema

被引:94
|
作者
Chen, Rung-Ching [1 ]
Hsieh, Chung-Hsun [1 ]
机构
[1] Chaoyang Univ Technol, Dept Informat Management, Taichung, Taiwan
关键词
latent semantic analysis; support vector machine; web page classification; feature extraction;
D O I
10.1016/j.eswa.2005.09.079
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional information retrieval method use keywords occurring in documents to determine the class of the documents, but usually retrieves unrelated web pages. In order to effectively classify web pages solving the synonymous keyword problem, we propose a web page classification based on support vector machine using a weighted vote schema for various features. The system uses both latent semantic analysis and web page feature selection training and recognition by the SVM model. Latent semantic analysis is used to find the semantic relations between keywords, and between documents. The latent semantic analysis method projects terms and a document into a vector space to find latent information in the document. At the same time, we also extract text features from web page content. Through text features, web pages are classified into a suitable category. These two features are sent to the SVM for training and testing respectively. Based on the output of the SVM, a voting schema is used to determine the category of the web page. Experimental results indicate our method is more effective than traditional methods. (C) 2005 Elsevier Ltd. All rights reserved.
引用
收藏
页码:427 / 435
页数:9
相关论文
共 50 条
  • [1] Web page classification using an ensemble of support vector machine classifiers
    Zhong S.
    Zou D.
    [J]. Journal of Networks, 2011, 6 (11) : 1625 - 1630
  • [2] Implicit Links Based Kernel to Enrich Support Vector Machine for Web Page Classification
    Belmouhcine, Abdelbadie
    Benkhalifa, Mohammed
    [J]. 2015 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS: THEORIES AND APPLICATIONS (SITA), 2015,
  • [3] Weighted support vector machine for classification
    Du, SX
    Chen, ST
    [J]. INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS, VOL 1-4, PROCEEDINGS, 2005, : 3866 - 3871
  • [4] Web Service Classification using Support Vector Machine
    Wang, Hongbing
    Shi, Yanqi
    Zhou, Xuan
    Zhou, Qianzhao
    Shao, Shizhi
    Bouguettaya, Athman
    [J]. 22ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2010), PROCEEDINGS, VOL 1, 2010,
  • [5] Web Document Classification using Support Vector Machine
    Shinde, Sharmila
    Joeg, Prasanna
    Vanjale, Sandeep
    [J]. 2017 INTERNATIONAL CONFERENCE ON CURRENT TRENDS IN COMPUTER, ELECTRICAL, ELECTRONICS AND COMMUNICATION (CTCEEC), 2017, : 688 - 691
  • [6] Web Page Classification Based-on A Least Square Support Vector Machine with Latent Semantic Analysis
    Zhang, Yong
    Fan, Bin
    Xiao, Long-bin
    [J]. FIFTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 2, PROCEEDINGS, 2008, : 528 - 532
  • [7] Support vector machine classification on the web
    Pavlidis, P
    Wapinski, I
    Noble, WS
    [J]. BIOINFORMATICS, 2004, 20 (04) : 586 - 587
  • [8] STRUCTURE-BASED CLASSIFICATION OF WEB DOCUMENTS USING SUPPORT VECTOR MACHINE
    He, Kejing
    Li, Chenyang
    [J]. PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 215 - 219
  • [9] Polarity Classification on Web-based Reviews using Support Vector Machine
    da Rocha, Renato S. C.
    Forero, Leonardo
    de Mello, Harold, Jr.
    Kohler, Manoela
    Vellasco, Marley
    [J]. 2016 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2016,
  • [10] Semantic Similarity based Web Document Classification Using Support Vector Machine
    Chinniyan, Kavitha
    Gangadharan, Sudha
    Sabanaikam, Kiruthika
    [J]. INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (03) : 285 - 292