An adaptive focused Web crawling algorithm based on learning automata

被引:28
|
作者
Torkestani, Javad Akbari [1 ]
机构
[1] Islamic Azad Univ, Arak Branch, Arak, Iran
关键词
Web crawling; Focused Web crawler; Search engine; Learning automata; AD-HOC NETWORKS; SEMANTIC WEB; INFORMATION; RETRIEVAL; SYSTEM; GRAPH;
D O I
10.1007/s10489-012-0351-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent years have witnessed the birth and explosive growth of the Web. The exponential growth of the Web has made it into a huge source of information wherein finding a document without an efficient search engine is unimaginable. Web crawling has become an important aspect of the Web search on which the performance of the search engines is strongly dependent. Focused Web crawlers try to focus the crawling process on the topic-relevant Web documents. Topic oriented crawlers are widely used in domain-specific Web search portals and personalized search tools. This paper designs a decentralized learning automata-based focused Web crawler. Taking advantage of learning automata, the proposed crawler learns the most relevant URLs and the promising paths leading to the target on-topic documents. It can effectively adapt its configuration to the Web dynamics. This crawler is expected to have a higher precision rate because of construction a small Web graph of only on-topic documents. Based on the Martingale theorem, the convergence of the proposed algorithm is proved. To show the performance of the proposed crawler, extensive simulation experiments are conducted. The obtained results show the superiority of the proposed crawler over several existing methods in terms of precision, recall, and running time. The t-test is used to verify the statistical significance of the precision results of the proposed crawler.
引用
收藏
页码:586 / 601
页数:16
相关论文
共 50 条
  • [41] A learning automata-based adaptive uniform fractional guard channel algorithm
    Beigy, Hamid
    Meybodi, M. R.
    JOURNAL OF SUPERCOMPUTING, 2015, 71 (03): : 871 - 893
  • [42] Focused crawling of tagged web resources using ontology
    Bedi, Punam
    Thukral, Anjali
    Banati, Hema
    COMPUTERS & ELECTRICAL ENGINEERING, 2013, 39 (02) : 613 - 628
  • [43] Exploiting multiple features with MEMMs for focused web crawling
    Liu, Hongyu
    Milios, Evangelos
    Korba, Larry
    NATURAL LANGUAGE AND INFORMATION SYSTEMS, PROCEEDINGS, 2008, 5039 : 99 - +
  • [44] A Novel Crawling Algorithm for Web Pages
    Golshani, Mohammad Amin
    Derhami, Vali
    ZarehBidoki, AliMohammad
    INFORMATION RETRIEVAL TECHNOLOGY, 2011, 7097 : 263 - 272
  • [45] Application of structured document parsing to focused web crawling
    Patel, Ahmed
    Schmidt, Nikita
    COMPUTER STANDARDS & INTERFACES, 2011, 33 (03) : 325 - 331
  • [46] Probabilistic graphical model for efficient focused web crawling
    Huang, Jianbin
    Ji, Hongbing
    Sun, Heli
    Journal of Computational Information Systems, 2007, 3 (04): : 1657 - 1664
  • [47] A supervised learning-based approach for focused web crawling for IoMT using global co-occurrence matrix
    Rajiv, S.
    Navaneethan, C.
    EXPERT SYSTEMS, 2023, 40 (04)
  • [48] Self-Adaptive Ontology-based Focused Crawling: A Literature Survey
    Khan, Mohd. Aamir
    Sharma, Dilip Kumar
    2016 5TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (TRENDS AND FUTURE DIRECTIONS) (ICRITO), 2016, : 595 - 601
  • [49] Reinforcement Learning with Classifier Selection for Focused Crawling
    Partalas, Ioannis
    Paliouras, Georgios
    Vlahavas, Ioannis
    ECAI 2008, PROCEEDINGS, 2008, 178 : 759 - +
  • [50] iCrawl: Improving the Freshness of Web Collections by Integrating Social Web and Focused Web Crawling
    Gossen, Gerhard
    Demidova, Elena
    Risse, Thomas
    PROCEEDINGS OF THE 15TH ACM/IEEE-CS JOINT CONFERENCE ON DIGITAL LIBRARIES (JCDL'15), 2015, : 75 - 84