Application of rough ensemble classifier to web services categorization and focused crawling

被引:4
|
作者
Saha S. [1 ]
Murthy C.A. [1 ]
Pal S.K. [1 ]
机构
[1] Center for Soft Computing Research, Indian Statistical Institute
来源
Web Intelligence and Agent Systems | 2010年 / 8卷 / 02期
关键词
Focused crawling; Rough ensemble classifier; URL prediction; Web service categorization; WSDL tag structure;
D O I
10.3233/WIA-2010-0186
中图分类号
学科分类号
摘要
This paper discusses the applications of rough ensemble classifier [27] in two emerging problems of web mining, the categorization of web services and the topic specific web crawling. Both applications, discussed here, consist of two major steps: (1) split of feature space based on internal tag structure of web services and hypertext to represent in a tensor space model, and (2) combining classifications obtained on different tensor components using rough ensemble classifier. In the first application we have discussed the classification of web services. Two step improvement on the existing classification results of web services has been shown here. In the first step we achieve better classification results over existing, by using tensor space model. In the second step further improvement of the results has been obtained by using Rough set based ensemble classifier. In the second application we have discussed the focused crawling using rough ensemble prediction. Our experiment regarding this application has provided better Harvest rate and better Target recall for focused crawling. © 2010 - IOS Press and the authors. All rights reserved.
引用
收藏
页码:181 / 202
页数:21
相关论文
共 50 条
  • [31] Probabilistic graphical model for efficient focused web crawling
    Huang, Jianbin
    Ji, Hongbing
    Sun, Heli
    Journal of Computational Information Systems, 2007, 3 (04): : 1657 - 1664
  • [32] A rough fuzzy approach to web usage categorization
    Asharaf, S
    Murty, MN
    FUZZY SETS AND SYSTEMS, 2004, 148 (01) : 119 - 129
  • [33] iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling
    Gossen, Gerhard
    Demidova, Elena
    Risse, Thomas
    PROCEEDINGS OF THE 15TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL'15), 2015, : 75 - 84
  • [34] Focused web crawling strategy based on web semantic analysis and web link analysis
    Xihua University Archives, Chengdu, Sichuan, 610039, China
    不详
    J. Comput. Inf. Syst., 2009, 6 (1793-1800):
  • [35] Rough Set-based SVM Classifier for Text Categorization
    Chen, Peng
    Liu, Shuang
    ICNC 2008: FOURTH INTERNATIONAL CONFERENCE ON NATURAL COMPUTATION, VOL 2, PROCEEDINGS, 2008, : 153 - +
  • [36] Study of Tibetan Text Categorization Based on Ensemble Learning Classifier
    Li Ailin
    Yu Hongzhi
    Yuan Bin
    2015 IEEE ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2015, : 69 - 72
  • [37] Ontology-based focused crawling of Deep Web sources
    Fang, Wei
    Cui, Zhiming
    Zhao, Pengpeng
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, 2007, 4798 : 514 - 519
  • [38] A Tree-Structure Classifier Ensemble for Tracked Target Categorization
    Yang, Yaling
    Wang, Haihui
    Zeng, Kun
    Lv, Han
    Li, Shanshan
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 1961 - 1965
  • [39] An adaptive focused Web crawling algorithm based on learning automata
    Torkestani, Javad Akbari
    APPLIED INTELLIGENCE, 2012, 37 (04) : 586 - 601
  • [40] Web image size prediction for efficient focused image crawling
    Andreadou, Katerina
    Papadopoulos, Symeon
    Kompatsiaris, Yiannis
    2015 13TH INTERNATIONAL WORKSHOP ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2015,