DSphere: A source-centric approach to crawling, indexing and searching the world wide web

被引:0
|
作者
Bamba, Bhuvan [1 ]
Liu, Ling [1 ]
Caverlee, James [1 ]
Padliya, Vaibhav [1 ]
Srivatsa, Mudhakar [1 ]
Bansal, Tushar [1 ]
Palekar, Mahesh [1 ]
Patrao, Joseph [1 ]
Li, Suiyang [1 ]
Singh, Aameek [1 ]
机构
[1] Georgia Inst Technol, Coll Comp, Atlanta, GA 30332 USA
来源
2007 IEEE 23RD INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3 | 2007年
关键词
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We describe DSPHERE1 - a decentralized system for crawling, indexing, searching and ranking of documents in the World Wide Web. Unlike most of the existing search technologies that depend heavily on a page-centric view of the Web, we advocate a source-centric view of the Web and propose a decentralized architecture for crawling, indexing and searching the Web in a distributed source-specific fashion. A fully decentralized crawler is developed to crawl the World Wide Web where each peer is assigned the responsibility of crawling a specific set of documents referred to as a source collection. Link analysis techniques are used for ranking documents. Traditional link analysis techniques suffer from problems like slow refresh rate and vulnerabilities to Web Spam. We propose a source-based link analysis approach, which computes fast and accurate ranking scores for all crawled documents.
引用
收藏
页码:1490 / +
页数:2
相关论文
共 50 条