Development of a scalable web crawler

被引:0
|
作者
Takano, H [1 ]
Kubo, N [1 ]
机构
[1] NEC Corp Ltd, C&C Media Res Labs, Tokyo, Japan
来源
NEC RESEARCH & DEVELOPMENT | 1999年 / 40卷 / 03期
关键词
World Wide Web (WWW); web crawler; web search service; HTTP (Hypertext Transfer Protocol); !text type='HTML']HTML[!/text] (Hypertext Markup Language); parallel architecture;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper describes the Nexplorer web-crawler system, which has a scalable architecture. A web crawler is an indispensable component of Web-search services. It should have high performance to gather millions of pages as fast as possible, and be configured to meet various service demands. To meet these requirements, we have designed Nexplorer to be a parallel system and configurable by controlling several parameters. Nexplorer has been used in the practical search service NETPLAZA of NEC, and we have confirmed through experiences; there that it has a high enough performance to keep downloaded Web pages as fresh as possible.
引用
收藏
页码:334 / 339
页数:6
相关论文
共 50 条
  • [1] Mercator: A scalable, extensible Web crawler
    Heydon A.
    Najork M.
    [J]. World Wide Web, 1999, 2 (4) : 219 - 229
  • [2] UbiCrawler: A scalable fully distributed Web crawler
    Dipto. di Scienze dell'Informazione, Univ. degli Studi di Milano, via Comelico 39/41, I-20135 Milano, Italy
    不详
    不详
    不详
    [J]. 1600, 711-726 (June 10, 2004):
  • [3] UbiCrawler: a scalable fully distributed Web crawler
    Boldi, P
    Codenotti, B
    Santini, M
    Vigna, S
    [J]. SOFTWARE-PRACTICE & EXPERIENCE, 2004, 34 (08): : 711 - 726
  • [4] Design of a Parallel and Scalable Crawler for the Hidden Web
    Gupta, Sonali
    Bhatia, Komal Kumar
    [J]. INTERNATIONAL JOURNAL OF INFORMATION RETRIEVAL RESEARCH, 2022, 12 (01)
  • [5] More effective, efficient,.and scalable Web crawler system architecture
    El-Ramly, NA
    Harb, HM
    Amin, N
    Tolba, AM
    [J]. ICEEC'04: 2004 INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONIC AND COMPUTER ENGINEERING, PROCEEDINGS, 2004, : 120 - 123
  • [6] Design and Implementation of a Scalable Distributed Web Crawler Based on Hadoop
    Shi, YuLiang
    Zhang, Ti
    [J]. 2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA), 2017, : 537 - 541
  • [7] GeoWeb Crawler: An Extensible and Scalable Web Crawling Framework for Discovering Geospatial Web Resources
    Huang, Chih-Yuan
    Chang, Hao
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2016, 5 (08)
  • [8] IMPLEMENTATION OF WEB CRAWLER
    Gupta, Pooja
    Johari, Kalpana
    [J]. 2009 SECOND INTERNATIONAL CONFERENCE ON EMERGING TRENDS IN ENGINEERING AND TECHNOLOGY (ICETET 2009), 2009, : 775 - 780
  • [9] The development technology of MOOC teaching resources based on web crawler
    Tian, Hui-Xia
    [J]. INTERNATIONAL JOURNAL OF CONTINUING ENGINEERING EDUCATION AND LIFE-LONG LEARNING, 2022, 32 (03) : 327 - 343
  • [10] Reducing web crawler overhead using mobile crawler
    M.E. Computer Science and Engineering, Arunai Engineering College, Tiruvannamalai-606 603, Tamil Nadu, India
    不详
    [J]. Int. Conf. Emerg. Trends Electr. Comput. Technol., ICETECT, 2011, (926-932):