The anatomy of a large-scale hypertextual Web search engine

被引:6730
|
作者
Brin, S [1 ]
Page, L [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Stanford, CA 94305 USA
来源
COMPUTER NETWORKS AND ISDN SYSTEMS | 1998年 / 30卷 / 1-7期
关键词
World Wide Web; search engines; information retrieval; PageRank; Google;
D O I
10.1016/S0169-7552(98)00110-X
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext. Google is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems. The prototype with a full text and hyperlink database of at least 24 million pages is available at http://google.stanford.edu/ To engineer a search engine is a challenging task. Search engines index tens to hundreds of millions of Web pages involving a comparable number of distinct terms. They answer tens of millions of queries every day. Despite the importance of large-scale search engines on the Web, very little academic research has been done on them. Furthermore, due to rapid advance in technology and Web proliferation, creating a Web search engine today is very different from three years ago. This paper provides an in-depth description of our large-scale Web search engine - the first such detailed public description we know of to date. Apart from the problems of scaling traditional search techniques to data of this magnitude, there are new technical challenges involved with using the additional information present in hypertext to produce better search results. This paper addresses this question of how to build a practical large-scale system which can exploit the additional information present in hypertext. Also we look at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want. (C) 1998 Published by Elsevier Science B.V. All rights reserved.
引用
收藏
页码:107 / 117
页数:11
相关论文
共 50 条
  • [1] RETRACTED: The Anatomy of a Large-Scale Hyper Textual Web Search Engine (Retracted Article)
    Sehgal, Umesh
    Kaur, Kuljeet
    Kumar, Pawan
    SECOND INTERNATIONAL CONFERENCE ON COMPUTER AND ELECTRICAL ENGINEERING, VOL 2, PROCEEDINGS, 2009, : 491 - +
  • [2] The anatomy of a large-scale hypertextual web search engine (Reprint from COMPUTER NETWORKS AND ISDN SYSTEMS, vol 30, pg 107-117, 1998)
    Brin, Sergey
    Page, Lawrence
    COMPUTER NETWORKS, 2012, 56 (18) : 3825 - 3833
  • [3] A hierarchical cache scheme for the large-scale web search engine
    Lim, Sungchae
    Ahn, Joonseon
    PROCEEDINGS OF NINTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING, 2008, : 925 - +
  • [4] Linguistics in large-scale Web search
    Gulla, JA
    Auran, PG
    Risvik, KM
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, 2002, 2553 : 218 - 222
  • [5] A Web service search engine for large-scale Web service discovery based on the probabilistic topic modeling and clustering
    Bukhari, Afnan
    Liu, Xumin
    SERVICE ORIENTED COMPUTING AND APPLICATIONS, 2018, 12 (02) : 169 - 182
  • [6] Large-scale duplicate detection for web image search
    Wang, Bin
    Li, Zhiwei
    Li, Mingjing
    Ma, Wei-Ying
    2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 353 - +
  • [7] WSCE: A crawler engine for large-scale discovery of web services
    Al-Masri, Eyhab
    Mahmoud, Qusay H.
    2007 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES, PROCEEDINGS, 2007, : 1104 - +
  • [8] Challenges in Using Peer-to-Peer Structures in Order to Design a Large-Scale Web Search Engine
    Mousavi, Hamid
    Movaghar, Ali
    ADVANCES IN COMPUTER SCIENCE AND ENGINEERING, 2008, 6 : 461 - 468
  • [9] Analysis of the user log for a large-scale Chinese search engine
    Wang, Ji-Min
    Chen, Chong
    Peng, Bo
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2004, 32 (SUPPL.): : 1 - 5
  • [10] GIGGLE: a search engine for large-scale integrated genome analysis
    Layer, Ryan M.
    Pedersen, Brent S.
    DiSera, Tonya
    Marth, Gabor T.
    Gertz, Jason
    Quinlan, Aaron R.
    NATURE METHODS, 2018, 15 (02) : 123 - +