Application of rough ensemble classifier to web services categorization and focused crawling

被引:4
|
作者
Saha S. [1 ]
Murthy C.A. [1 ]
Pal S.K. [1 ]
机构
[1] Center for Soft Computing Research, Indian Statistical Institute
来源
Web Intelligence and Agent Systems | 2010年 / 8卷 / 02期
关键词
Focused crawling; Rough ensemble classifier; URL prediction; Web service categorization; WSDL tag structure;
D O I
10.3233/WIA-2010-0186
中图分类号
学科分类号
摘要
This paper discusses the applications of rough ensemble classifier [27] in two emerging problems of web mining, the categorization of web services and the topic specific web crawling. Both applications, discussed here, consist of two major steps: (1) split of feature space based on internal tag structure of web services and hypertext to represent in a tensor space model, and (2) combining classifications obtained on different tensor components using rough ensemble classifier. In the first application we have discussed the classification of web services. Two step improvement on the existing classification results of web services has been shown here. In the first step we achieve better classification results over existing, by using tensor space model. In the second step further improvement of the results has been obtained by using Rough set based ensemble classifier. In the second application we have discussed the focused crawling using rough ensemble prediction. Our experiment regarding this application has provided better Harvest rate and better Target recall for focused crawling. © 2010 - IOS Press and the authors. All rights reserved.
引用
收藏
页码:181 / 202
页数:21
相关论文
共 50 条
  • [21] Focused web crawling in the acquisition of comparable corpora
    Tuomas Talvensaari
    Ari Pirkola
    Kalervo Järvelin
    Martti Juhola
    Jorma Laurikkala
    Information Retrieval, 2008, 11 : 427 - 445
  • [22] Focused Crawling for Building Web Comment Corpora
    Neunerdt, Melanie
    Niermann, Markus
    Mathar, Rudolf
    Trevisan, Bianka
    2013 IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE (CCNC), 2013, : 685 - 688
  • [23] Focused web crawling in the acquisition of comparable corpora
    Talvensaari, Tuornas
    Pirkola, Ari
    Jarvelin, Kalervo
    Juhola, Martti
    Laurikkala, Jorma
    INFORMATION RETRIEVAL, 2008, 11 (05): : 427 - 445
  • [24] Hybrid focused crawling on the Surface and the Dark Web
    Iliou C.
    Kalpakis G.
    Tsikrika T.
    Vrochidis S.
    Kompatsiaris I.
    EURASIP Journal on Information Security, 2017 (1)
  • [25] Neuro-fuzzy Rough Classifier Ensemble
    Korytkowski, Marcin
    Nowicki, Robert
    Scherer, Rafal
    ARTIFICIAL NEURAL NETWORKS - ICANN 2009, PT I, 2009, 5768 : 817 - 823
  • [26] Fuzzy-rough Classifier Ensemble Selection
    Diao, Ren
    Shen, Qiang
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 1516 - 1522
  • [27] Synonyms extraction using Web content focused crawling
    Chen, Chien-Hsing
    Hsu, Chung-Chian
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 286 - 297
  • [28] Focused crawling of tagged web resources using ontology
    Bedi, Punam
    Thukral, Anjali
    Banati, Hema
    COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (02) : 613 - 628
  • [29] A Web-Based Semantic Focused Crawling Approach
    Liu, Yongjian
    Ma, Deng
    Sun, Jianpeng
    2013 INTERNATIONAL CONFERENCE ON CYBER SCIENCE AND ENGINEERING (CYBERSE 2013), 2013, : 287 - 293
  • [30] Exploiting multiple features with MEMMs for focused web crawling
    Liu, Hongyu
    Milios, Evangelos
    Korba, Larry
    NATURAL LANGUAGE AND INFORMATION SYSTEMS, PROCEEDINGS, 2008, 5039 : 99 - +