Apoidea: A decentralized Peer-to-Peer architecture for crawling the World Wide Web

被引:0
|
作者
Singh, A [1 ]
Srivatsa, M [1 ]
Liu, L [1 ]
Miller, T [1 ]
机构
[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper describes a decentralized peer-to-peer model for building a Web crawler. Most of the current systems use a centralized client-server model, in which the crawl is done by one or more tightly coupled machines, but the distribution of the crawling jobs and the collection of crawled results are managed in a centralized system using a centralized URL repository. Centralized solutions are known to have problems like link congestion, being a single point of failure, and expensive administration. It requires both horizontal and vertical scalability solutions to manage Network File Systems (NFS) and load balancing DNS and HTTP requests. In this paper, we present an architecture of a completely distributed and decentralized Peer-to-Peer (P2P) crawler called Apoidea, which is self-managing and uses geographical proximity of the web resources to the peers for a better and faster crawl. We use Distributed Hash Table (DHT) based protocols to perform the critical URL-duplicate and content-duplicate tests.
引用
收藏
页码:126 / 142
页数:17
相关论文
共 50 条
  • [1] A peer-to-peer architecture for Web annotation sharing
    Yang, CZ
    Chen, SC
    Chen, IX
    [J]. DIGITAL LIBRARIES: IMPLEMENTING STRATEGIES AND SHARING EXPERIENCES, PROCEEDINGS, 2005, 3815 : 493 - 494
  • [2] A Web Based Peer-to-Peer RFID Architecture
    Fernando, Harinda
    Mahdin, Hairulnizam
    [J]. RECENT ADVANCES ON SOFT COMPUTING AND DATA MINING, 2017, 549 : 549 - 559
  • [3] Small world architecture for peer-to-peer networks
    Liu, Lu
    Mackin, Stephen
    Antonopoulos, Nick
    [J]. 2006 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WORKSHOPS PROCEEDINGS, 2006, : 451 - +
  • [4] Decentralized Peer-to-Peer Auctions
    Marcus Fontoura
    Mihail Ionescu
    Naftaly Minsky
    [J]. Electronic Commerce Research, 2005, 5 (1) : 7 - 24
  • [5] Collaborative web caching system based on peer-to-peer architecture
    Ling, Bo
    Wang, Xiao-Yu
    Zhou, Ao-Ying
    Ng, Wee-Siong
    [J]. Jisuanji Xuebao/Chinese Journal of Computers, 2005, 28 (02): : 170 - 178
  • [6] An Autonomic Peer-to-Peer Architecture for Hosting Stateful Web Services
    Reich, Christoph
    Bubendorfer, Kris
    Buyya, Rajkumar
    [J]. CCGRID 2008: EIGHTH IEEE INTERNATIONAL SYMPOSIUM ON CLUSTER COMPUTING AND THE GRID, VOLS 1 AND 2, PROCEEDINGS, 2008, : 250 - +
  • [7] Peer-to-peer based QoS registry architecture for web services
    Li, Fei
    Yang, Fangchun
    Shuang, Kai
    Su, Sen
    [J]. DISTRIBUTED APPLICATIONS AND INTEROPERABLE SYSTEMS, PROCEEDINGS, 2007, 4531 : 133 - +
  • [8] A decentralized LR-PON architecture supporting efficient peer-to-peer communication
    Li, Yan
    [J]. OPTIK, 2013, 124 (20): : 4602 - 4606
  • [9] Peer-to-peer and web computing
    Haridi, S
    Aberer, K
    Van Roy, P
    Colajanni, M
    [J]. EURO-PAR 2004 PARALLEL PROCESSING, PROCEEDINGS, 2004, 3149 : 1013 - 1013
  • [10] Web services and Peer-to-Peer
    Hillenbrand, M
    Müller, P
    [J]. PEER-TO-PEER SYSTEMS AND APPLICATIONS, 2005, 3485 : 207 - 224