Smart Focused Web Crawler for Hidden Web

被引:1
|
作者
Kaur, Sawroop [1 ]
Geetha, G. [1 ]
机构
[1] Lovely Profess Univ, Phagwara, Punjab, India
关键词
Hidden web; Focused crawler; MapReduce;
D O I
10.1007/978-981-13-0586-3_42
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Huge amount of useful data is buried under the layers of hidden web that is accessible when submit forms are filled by users. Web crawlers can access this data only by interacting with web-based search forms. Traditional search engines cannot efficiently search and index these deep or hidden web pages. Retrieving data with high accuracy and coverage in hidden web is a challenging task. Focused crawling guarantees that the document that is found has a place with the particular subject. In the proposed architecture, Smart focused web crawler for hidden web is based on XML parsing of web pages, by first finding the hidden web pages and learning their features. Term frequency-inverse document frequency will be used to build classifier in order to find relevant pages, using completely automatic adaptive learning technique. This system will help in increasing the coverage and accuracy of retrieved web pages. For distributed processing, MapReduce framework of Hadoop will be used.
引用
收藏
页码:419 / 427
页数:9
相关论文
共 50 条
  • [1] SMART CRAWLER FOR HIDDEN WEB INTERFACES
    Sundarde, Sunita
    Rathod, P. R.
    [J]. PROCEEDINGS OF 2016 ONLINE INTERNATIONAL CONFERENCE ON GREEN ENGINEERING AND TECHNOLOGIES (IC-GET), 2016,
  • [2] Design of a Mobile Web Crawler for Hidden Web
    Kumar, Manish
    Bhatia, Rajesh
    [J]. 2016 3rd International Conference on Recent Advances in Information Technology (RAIT), 2016, : 186 - 190
  • [3] Keyword Focused Web Crawler
    Agre, Gunjan H.
    Mahajan, Nikita V.
    [J]. 2015 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND COMMUNICATION SYSTEMS (ICECS), 2015, : 1089 - 1092
  • [4] Smart distributed web crawler
    Bal, Sawroop Kaur
    Geetha, G.
    [J]. 2016 INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND EMBEDDED SYSTEMS (ICICES), 2016,
  • [5] A Focused Crawler for Dark Web Forums
    Fu, Tianjun
    Abbasi, Ahmed
    Chen, Hsinchun
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2010, 61 (06): : 1213 - 1231
  • [6] A Framework of a Hybrid Focused Web Crawler
    Sun, Yixue
    Jin, Peiquan
    Yue, Lihua
    [J]. 2008 SECOND INTERNATIONAL CONFERENCE ON FUTURE GENERATION COMMUNICATION AND NETWORKING SYMPOSIA, VOLS 1-5, PROCEEDINGS, 2008, : 146 - 149
  • [7] An algorithm OFC for the focused web crawler
    Zhu, Qiang
    [J]. PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 4059 - 4063
  • [8] Focused Web Crawler for Indonesian Recipes
    Alfarisy, Gusti Ahmad Fanshuri
    Bachtiar, Fitra A.
    [J]. 2017 INTERNATIONAL CONFERENCE ON SUSTAINABLE INFORMATION ENGINEERING AND TECHNOLOGY (SIET), 2017, : 196 - 202
  • [9] SIMHAR-Smart Distributed Web Crawler for the Hidden Web Using SIM plus Hash and Redis Server
    Kaur, Sawroop
    Geetha, G.
    [J]. IEEE ACCESS, 2020, 8 : 117582 - 117592
  • [10] Keyword query based focused Web crawler
    Kumar, Manish
    Bindal, Ankit
    Gautam, Robin
    Bhatia, Rajesh
    [J]. 6TH INTERNATIONAL CONFERENCE ON SMART COMPUTING AND COMMUNICATIONS, 2018, 125 : 584 - 590