An adaptive focused Web crawling algorithm based on learning automata

被引:28
|
作者
Torkestani, Javad Akbari [1 ]
机构
[1] Islamic Azad Univ, Arak Branch, Arak, Iran
关键词
Web crawling; Focused Web crawler; Search engine; Learning automata; AD-HOC NETWORKS; SEMANTIC WEB; INFORMATION; RETRIEVAL; SYSTEM; GRAPH;
D O I
10.1007/s10489-012-0351-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recent years have witnessed the birth and explosive growth of the Web. The exponential growth of the Web has made it into a huge source of information wherein finding a document without an efficient search engine is unimaginable. Web crawling has become an important aspect of the Web search on which the performance of the search engines is strongly dependent. Focused Web crawlers try to focus the crawling process on the topic-relevant Web documents. Topic oriented crawlers are widely used in domain-specific Web search portals and personalized search tools. This paper designs a decentralized learning automata-based focused Web crawler. Taking advantage of learning automata, the proposed crawler learns the most relevant URLs and the promising paths leading to the target on-topic documents. It can effectively adapt its configuration to the Web dynamics. This crawler is expected to have a higher precision rate because of construction a small Web graph of only on-topic documents. Based on the Martingale theorem, the convergence of the proposed algorithm is proved. To show the performance of the proposed crawler, extensive simulation experiments are conducted. The obtained results show the superiority of the proposed crawler over several existing methods in terms of precision, recall, and running time. The t-test is used to verify the statistical significance of the precision results of the proposed crawler.
引用
下载
收藏
页码:586 / 601
页数:16
相关论文
共 50 条
  • [1] An adaptive focused Web crawling algorithm based on learning automata
    Javad Akbari Torkestani
    Applied Intelligence, 2012, 37 : 586 - 601
  • [2] Focused Web Crawling: A Framework for Crawling of Country Based Financial Data
    Dey, Manas Kanti
    Chowdhury, Hasan Md Suhag
    Shamanta, Debakar
    Ahmed, Khandakar Entenam Unayes
    2010 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND FINANCIAL ENGINEERING (ICIFE), 2010, : 409 - 412
  • [3] An ontology learning based approach for focused web crawling using combined normalized pointwise mutual information and Resnik algorithm
    Joe Dhanith P.R.
    Surendiran B.
    International Journal of Computers and Applications, 2022, 44 (12): : 1123 - 1129
  • [4] Focused Web Crawling Algorithms
    Amrin, Andas
    Xia, Chunlei
    Dai, Shuguang
    JOURNAL OF COMPUTERS, 2015, 10 (04) : 245 - 251
  • [5] Focused crawling for the hidden web
    Liakos, Panagiotis
    Ntoulas, Alexandros
    Labrinidis, Alexandros
    Delis, Alex
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2016, 19 (04): : 605 - 631
  • [6] Focused crawling for the hidden web
    Panagiotis Liakos
    Alexandros Ntoulas
    Alexandros Labrinidis
    Alex Delis
    World Wide Web, 2016, 19 : 605 - 631
  • [7] A Web-Based Semantic Focused Crawling Approach
    Liu, Yongjian
    Ma, Deng
    Sun, Jianpeng
    2013 INTERNATIONAL CONFERENCE ON CYBER SCIENCE AND ENGINEERING (CYBERSE 2013), 2013, : 287 - 293
  • [8] An adaptive honeypot deployment algorithm based on learning automata
    Zhang, Yan
    Di, Chong
    Han, Zhuoran
    Li, Yichen
    Li, Shenghong
    2017 IEEE SECOND INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC), 2017, : 521 - 527
  • [9] Learning Automata-based Adaptive Web Services Composition
    Li, Guoqiang
    Song, Dandan
    Liao, Lejian
    Sun, Fuzhen
    Du, Jianguang
    2014 5TH IEEE INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS), 2014, : 792 - 795
  • [10] EFFECTS OF CRAWLING STRATEGIES ON THE PERFORMANCE OF FOCUSED WEB CRAWLING
    Pirkola, Ari
    Talvensaari, Tuomas
    WEBIST 2009: PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2009, : 376 - 381