An adaptive focused Web crawling algorithm based on learning automata

被引:28
|
作者
Torkestani, Javad Akbari [1 ]
机构
[1] Islamic Azad Univ, Arak Branch, Arak, Iran
关键词
Web crawling; Focused Web crawler; Search engine; Learning automata; AD-HOC NETWORKS; SEMANTIC WEB; INFORMATION; RETRIEVAL; SYSTEM; GRAPH;
D O I
10.1007/s10489-012-0351-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent years have witnessed the birth and explosive growth of the Web. The exponential growth of the Web has made it into a huge source of information wherein finding a document without an efficient search engine is unimaginable. Web crawling has become an important aspect of the Web search on which the performance of the search engines is strongly dependent. Focused Web crawlers try to focus the crawling process on the topic-relevant Web documents. Topic oriented crawlers are widely used in domain-specific Web search portals and personalized search tools. This paper designs a decentralized learning automata-based focused Web crawler. Taking advantage of learning automata, the proposed crawler learns the most relevant URLs and the promising paths leading to the target on-topic documents. It can effectively adapt its configuration to the Web dynamics. This crawler is expected to have a higher precision rate because of construction a small Web graph of only on-topic documents. Based on the Martingale theorem, the convergence of the proposed algorithm is proved. To show the performance of the proposed crawler, extensive simulation experiments are conducted. The obtained results show the superiority of the proposed crawler over several existing methods in terms of precision, recall, and running time. The t-test is used to verify the statistical significance of the precision results of the proposed crawler.
引用
收藏
页码:586 / 601
页数:16
相关论文
共 50 条
  • [21] An adaptive learning to rank algorithm: Learning automata approach
    Torkestani, Javad Akbari
    DECISION SUPPORT SYSTEMS, 2012, 54 (01) : 574 - 583
  • [22] Knowledgebase Harvesting for User-Adaptive Systems Through Focused Crawling and Semantic Web
    Raufi, Bujar
    Ismaili, Florije
    Ajdari, Jaumin
    Zenuni, Xhemal
    COMPUTER SYSTEMS AND TECHNOLOGIES, COMPSYSTECH'16, 2016, : 323 - 330
  • [23] Learnable Focused Meta Crawling Through Web
    Kumar, Mukesh
    Vig, Renu
    2ND INTERNATIONAL CONFERENCE ON COMMUNICATION, COMPUTING & SECURITY [ICCCS-2012], 2012, 1 : 606 - 611
  • [24] A Fast Distributed Focused-Web Crawling
    Achsan, Harry T. Yani
    Wibowo, Wahyu Catur
    24TH DAAAM INTERNATIONAL SYMPOSIUM ON INTELLIGENT MANUFACTURING AND AUTOMATION, 2013, 2014, 69 : 492 - 499
  • [25] Focused web crawling in the acquisition of comparable corpora
    Tuomas Talvensaari
    Ari Pirkola
    Kalervo Järvelin
    Martti Juhola
    Jorma Laurikkala
    Information Retrieval, 2008, 11 : 427 - 445
  • [26] Focused Crawling for Building Web Comment Corpora
    Neunerdt, Melanie
    Niermann, Markus
    Mathar, Rudolf
    Trevisan, Bianka
    2013 IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE (CCNC), 2013, : 685 - 688
  • [27] Focused web crawling in the acquisition of comparable corpora
    Talvensaari, Tuornas
    Pirkola, Ari
    Jarvelin, Kalervo
    Juhola, Martti
    Laurikkala, Jorma
    INFORMATION RETRIEVAL, 2008, 11 (05): : 427 - 445
  • [28] Hybrid focused crawling on the Surface and the Dark Web
    Iliou C.
    Kalpakis G.
    Tsikrika T.
    Vrochidis S.
    Kompatsiaris I.
    EURASIP Journal on Information Security, 2017 (1)
  • [29] An adaptive learning automata-based ranking function discovery algorithm
    Torkestani, Javad Akbari
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 39 (02) : 441 - 459
  • [30] An adaptive learning automata-based ranking function discovery algorithm
    Javad Akbari Torkestani
    Journal of Intelligent Information Systems, 2012, 39 : 441 - 459