TOWARDS A DISTRIBUTED SEARCH ENGINE

被引:0
|
作者
Baeza-Yates, Ricardo [1 ]
机构
[1] Yahoo Res, Barcelona, Spain
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Distributed search engines are often more complex to implement compared to centralized engines. Distributing a search engine across multiple sites, however, has several advantages. In particular, it enables the utilization of less computer resources and the exploitation of data and user locality. In this presentation we show the feasibility of distributed Web search engines, by proposing a model for assessing the total cost of a distributed Web-search engine that includes the computational costs as well as the communication cost among all distributed sites. Using examples, we show that a distributed Web search engine can be more cost effective than a centralized one, if there is a large percentage of local queries, which is usually the case. We then present a query-processing algorithm that maximizes the amount of queries answered locally, without sacrificing the quality of the results, by using caching and partial replication. We simulate our algorithm on real document collections and real query workloads to measure the actual parameters needed for our cost model, and we show that a distributed search engine can be competitive compared to a centralized architecture with respect to cost. This is joint work with Aris Gionis, Flavio Junqueira, Vassilis Plachouras and Luca Telloli.
引用
收藏
页码:IS13 / IS13
页数:1
相关论文
共 50 条
  • [41] LSHDB: A Parallel and Distributed Engine for Record Linkage and Similarity Search
    Karapiperis, Dimitrios
    Gkoulalas-Divanis, Aris
    Verykios, Vassilios S.
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 1336 - 1339
  • [42] Design and Implementation of Distributed Image Search Engine Based on Content
    Sun Huabin
    Liu Xiaoxia
    [J]. FBIE: 2008 INTERNATIONAL SEMINAR ON FUTURE BIOMEDICAL INFORMATION ENGINEERING, PROCEEDINGS, 2008, : 64 - 66
  • [43] Visibiome: an efficient microbiome search engine based on a scalable, distributed
    Azman, Syafiq Kamarul
    Anwar, Muhammad Zohaib
    Henschel, Andreas
    [J]. BMC BIOINFORMATICS, 2017, 18
  • [44] Distributed Hayabusa: Scalable Syslog Search Engine Optimized for Time-Dimensional Search
    Abe, Hiroshi
    Shima, Keiichi
    Miyamoto, Daisuke
    Sekiya, Yuji
    Ishihara, Tomohiro
    Okada, Kazuya
    Nakamura, Ryo
    Matsuura, Satoshi
    [J]. ASIAN INTERNET ENGINEERING CONFERENCE (AINTEC 2019), 2019, : 9 - 16
  • [45] A light-weight distributed web search engine: Distributed web crawling part
    Sheng, YP
    Chen, Z
    Rahimi, S
    Mandalapu, S
    [J]. COMPUTERS AND THEIR APPLICATIONS, 2004, : 197 - 200
  • [46] Towards Distributed Local Search Through Neighborhood Combinators
    Ospina, Gustavo
    De Landtsheer, Renaud
    [J]. PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON OPERATIONS RESEARCH AND ENTERPRISE SYSTEMS (ICORES), 2021, : 248 - 255
  • [47] iSeeker: Towards an Engine for Processing Aggregated Search on Linked Data
    Barhoun, Youssef
    Haque, Rafiqul
    Hacid, Mohand-Said
    [J]. 2015 IEEE CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC), 2015, : 184 - 191
  • [48] Where is that Button Again?! - Towards a Universal GUI Search Engine
    Hertling, Sven
    Schroeder, Markus
    Jilek, Christian
    Dengel, Andreas
    [J]. ICAART: PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 2, 2017, : 217 - 227
  • [49] Towards scalable distributed graph database engine for hybrid clouds
    Dayarathna, Miyuru
    Suzumura, Toyotaro
    [J]. Proceedings of DataCloud 2014: 5th International Workshop on Data Intensive Computing in the Clouds - Held in Conjunction with SC 2014: The International Conference for High Performance Computing, Networking, Storage and Analysis, 2014, : 1 - 8
  • [50] Towards Scalable Distributed Graph Database Engine for Hybrid Clouds
    Dayarathna, Miyuru
    Suzumura, Toyotaro
    [J]. 2014 5TH INTERNATIONAL WORKSHOP ON DATA-INTENSIVE COMPUTING IN THE CLOUDS (DATACLOUD), 2014, : 1 - 8