NoSQL Web Crawler Application

被引:0
|
作者
Deka, Ganesh Chandra [1 ]
机构
[1] Minist Skill Dev & Entrepreneurship, Directorate Gen Training, New Delhi, India
关键词
D O I
10.1016/bs.adcom.2017.08.001
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of Web technology, the Web is full of unstructured data called Big Data. However, these data are not easy to collect, access, and process at large scale. Web Crawling is an optimization problem. Site-specific crawling of various social media platforms, e-Commerce websites, Blogs, News websites, and Forums is a requirement for various business organizations to answer a search quarry from webpages. Indexing of huge number of webpage requires a cluster with several petabytes of usable disk. Since the NoSQL databases are highly scalable, use of NoSQL database for storing the Crawler data is increasing along with the growing popularity of NoSQL databases. This chapter discusses about the application of NoSQL database in Web Crawler application to store the data collected by the Web Crawler.
引用
收藏
页码:77 / 100
页数:24
相关论文
共 50 条
  • [1] Study And Application of Web Crawler Algorithm Based on Heritrix
    Liu, DongFei
    Fan, XianShuang
    ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 1069 - 1072
  • [2] Design and Application of Intelligent Dynamic Crawler for Web Data Mining
    Zheng Guojun
    Jia Wenchao
    Shi Jihui
    Shi Fan
    Zhu Hao
    Liu Jiang
    2017 32ND YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2017, : 1098 - 1105
  • [3] Design Crawler: A Web Application For Digital Design Metadata Analysis
    Hosny, Sherif
    Baher, Amr
    2019 20TH INTERNATIONAL WORKSHOP ON MICROPROCESSOR/SOC TEST, SECURITY AND VERIFICATION (MTV 2019), 2019, : 31 - 34
  • [4] Application of bloom filter for duplicate URL detection in a web crawler
    Kapoor, Aveksha
    Arora, Vinay
    2016 IEEE 2ND INTERNATIONAL CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (IEEE CIC), 2016, : 246 - 255
  • [5] IMPLEMENTATION OF WEB CRAWLER
    Gupta, Pooja
    Johari, Kalpana
    2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 775 - 780
  • [6] Reducing web crawler overhead using mobile crawler
    M.E. Computer Science and Engineering, Arunai Engineering College, Tiruvannamalai-606 603, Tamil Nadu, India
    不详
    Int. Conf. Emerg. Trends Electr. Comput. Technol., ICETECT, 2011, (926-932):
  • [7] An architecture for a focused trend parallel Web crawler with the application of clickstrearn analysis
    Ahmadi-Abkenari, Fatemeh
    Selamat, Ali
    INFORMATION SCIENCES, 2012, 184 (01) : 266 - 281
  • [8] Performance Aspects of Migrating a Web Application from a Relational to a NoSQL Database
    Harezlak, Katarzyna
    Skowron, Robert
    BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2015, 2015, 521 : 107 - 115
  • [9] Web Crawler for searching Deep web sites
    Patil, Tejaswini Arun
    Chobe, Santosh
    2017 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION, CONTROL AND AUTOMATION (ICCUBEA), 2017,
  • [10] Design of a Mobile Web Crawler for Hidden Web
    Kumar, Manish
    Bhatia, Rajesh
    2016 3rd International Conference on Recent Advances in Information Technology (RAIT), 2016, : 186 - 190