Comprehensive Characterization of an Open Source Document Search Engine

被引:3
|
作者
Hadjilambrou, Zacharias [1 ]
Kleanthous, Marios [1 ]
Antoniou, Georgia [1 ]
Portero, Antoni [2 ,3 ]
Sazeides, Yiannakis [1 ]
机构
[1] Univ Cyprus, 1 Univ Ave, CY-2109 Aglantzia, Cyprus
[2] IT4Innovations, Ostrava, Czech Republic
[3] VSB Univ Ostrava, IT4Innovations, Ostrava 70833, Czech Republic
关键词
Document search; index partitioning; parallel index search; parallelism; characterization; real hardware; measurement; evaluation; performance; experimentation; WEB SEARCH;
D O I
10.1145/3320346
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] A Fourier descriptor based character recognition engine implemented under the Gamera open-source document processing framework
    Hopkins, J
    Andersen, T
    [J]. Document Recognition and Retrieval XII, 2005, 5676 : 111 - 118
  • [42] MetaFinder: A meta-search engine with an open architecture
    Zhang, CX
    Lu, SY
    Liang, LR
    [J]. PROCEEDINGS OF THE ISCA 20TH INTERNATIONAL CONFERENCE ON COMPUTERS AND THEIR APPLICATIONS, 2005, : 80 - 85
  • [43] Focused Crawler Framework Based on Open Search Engine
    Liu, Jiawei
    Huang, Yongfeng
    [J]. CLOUD COMPUTING AND SECURITY, PT III, 2018, 11065 : 56 - 68
  • [44] Federated Search Engine for Open Educational Linked Data
    Mosharraf, Maedeh
    Taghiyareh, Fattaneh
    [J]. BULLETIN OF THE TECHNICAL COMMITTEE ON LEARNING TECHNOLOGY, 2016, 18 (04): : 6 - 9
  • [45] eScriptorium: An Open Source Platform for Historical Document Analysis
    Kiessling, Benjamin
    Tissot, Robin
    Stokes, Peter
    Ben Ezra, Daniel Stokl
    [J]. 2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION WORKSHOPS (ICDARW) AND 2ND INTERNATIONAL WORKSHOP ON OPEN SERVICES AND TOOLS FOR DOCUMENT ANALYSIS (OST), VOL 2, 2019, : 19 - +
  • [46] Comprehensive Approaches for the Search and Characterization of Staphylococcins
    Fernandez-Fernandez, Rosa
    Lozano, Carmen
    Reuben, Rine Christopher
    Ruiz-Ripa, Laura
    Zarazaga, Myriam
    Torres, Carmen
    [J]. MICROORGANISMS, 2023, 11 (05)
  • [47] Automating Document Annotation Using Open Source Knowledge
    Singhal, Ayush
    Kasturi, Ravindra
    Srivastava, Jaideep
    [J]. 2013 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2013, : 199 - 204
  • [48] The Evolution of a Crawling Strategy for an Academic Document Search Engine: Whitelists and Blacklists
    Wu, Jian
    Teregowda, Pradeep
    Ramirez, Juan Pablo Fernandez
    Mitra, Prasenjit
    Zheng, Shuyi
    Giles, C. Lee
    [J]. PROCEEDINGS OF THE 3RD ANNUAL ACM WEB SCIENCE CONFERENCE, 2012, 2012, : 340 - 343
  • [49] A document-centered approach to a natural language music search engine
    Knees, Peter
    Pohle, Tim
    Schedl, Markus
    [J]. ADVANCES IN INFORMATION RETRIEVAL, 2008, 4956 : 627 - 631
  • [50] A Comprehensive Review and Synthesis of Open Source Research
    Aksulu, Altay
    Wade, Michael
    [J]. JOURNAL OF THE ASSOCIATION FOR INFORMATION SYSTEMS, 2010, 11 (11): : 576 - 656