An adaptive focused Web crawling algorithm based on learning automata

被引:28
|
作者
Torkestani, Javad Akbari [1 ]
机构
[1] Islamic Azad Univ, Arak Branch, Arak, Iran
关键词
Web crawling; Focused Web crawler; Search engine; Learning automata; AD-HOC NETWORKS; SEMANTIC WEB; INFORMATION; RETRIEVAL; SYSTEM; GRAPH;
D O I
10.1007/s10489-012-0351-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent years have witnessed the birth and explosive growth of the Web. The exponential growth of the Web has made it into a huge source of information wherein finding a document without an efficient search engine is unimaginable. Web crawling has become an important aspect of the Web search on which the performance of the search engines is strongly dependent. Focused Web crawlers try to focus the crawling process on the topic-relevant Web documents. Topic oriented crawlers are widely used in domain-specific Web search portals and personalized search tools. This paper designs a decentralized learning automata-based focused Web crawler. Taking advantage of learning automata, the proposed crawler learns the most relevant URLs and the promising paths leading to the target on-topic documents. It can effectively adapt its configuration to the Web dynamics. This crawler is expected to have a higher precision rate because of construction a small Web graph of only on-topic documents. Based on the Martingale theorem, the convergence of the proposed algorithm is proved. To show the performance of the proposed crawler, extensive simulation experiments are conducted. The obtained results show the superiority of the proposed crawler over several existing methods in terms of precision, recall, and running time. The t-test is used to verify the statistical significance of the precision results of the proposed crawler.
引用
收藏
页码:586 / 601
页数:16
相关论文
共 50 条
  • [31] Focused Crawling Through Reinforcement Learning
    Han, Miyoung
    Wuillemin, Pierre-Henri
    Senellart, Pierre
    WEB ENGINEERING, ICWE 2018, 2018, 10845 : 259 - 276
  • [32] An Ontology-Based adaptive Topical Crawling Algorithm
    Shen Jinxing
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS RESEARCH AND MECHATRONICS ENGINEERING, 2015, 121 : 1083 - 1088
  • [33] An Ontology-Based adaptive Topical Crawling Algorithm
    Shen Jin-Xing
    2008 4TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS, NETWORKING AND MOBILE COMPUTING, VOLS 1-31, 2008, : 12210 - 12213
  • [34] Deep Web adaptive crawling based on minimum executable pattern
    Jun Liu
    Lu Jiang
    Zhaohui Wu
    Qinghua Zheng
    Journal of Intelligent Information Systems, 2011, 36 : 197 - 215
  • [35] Deep Web adaptive crawling based on minimum executable pattern
    Liu, Jun
    Jiang, Lu
    Wu, Zhaohui
    Zheng, Qinghua
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2011, 36 (02) : 197 - 215
  • [36] An effective Relevance Prediction algorithm based on hierarchical taxonomy for focused crawling
    Chen, Zhumin
    Ma, Jun
    Han, Xiaohui
    Zhang, Dongmei
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 613 - 619
  • [37] An Improved Topic Relevance Algorithm for Focused Crawling
    Hao, Hong-Wei
    Mu, Cui-Xia
    Yin, Xu-Cheng
    Li, Shen
    Wang, Zhi-Bin
    2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 850 - 855
  • [38] A learning automata-based adaptive uniform fractional guard channel algorithm
    Hamid Beigy
    M. R. Meybodi
    The Journal of Supercomputing, 2015, 71 : 871 - 893
  • [39] Synonyms extraction using Web content focused crawling
    Chen, Chien-Hsing
    Hsu, Chung-Chian
    INFORMATION RETRIEVAL TECHNOLOGY, 2008, 4993 : 286 - 297
  • [40] Adaptive Focused Crawling Using Online Learning A Study on Content Related to Islamic Extremism
    Iliou, Christos
    Tsikrika, Theodora
    Kalpakis, George
    Vrochidis, Stefanos
    Kompatsiaris, Ioannis
    INTERNET SCIENCE (INSCI 2018), 2018, 11193 : 40 - 53