Anserini: Enabling the Use of Lucene for Information Retrieval Research

被引:165
|
作者
Yang, Peilin [1 ]
Fang, Hui [1 ]
Lin, Jimmy [2 ]
机构
[1] Univ Delaware, Dept Elect & Comp Engn, Newark, DE 19716 USA
[2] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON, Canada
基金
美国国家科学基金会; 加拿大自然科学与工程研究理事会;
关键词
D O I
10.1145/3077136.3080721
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software toolkits play an essential role in information retrieval research. Most open-source toolkits developed by academics are designed to facilitate the evaluation of retrieval models over standard test collections. Efforts are generally directed toward better ranking and less attention is usually given to scalability and other operational considerations. On the other hand, Lucene has become the de facto platform in industry for building search applications (outside a small number of companies that deploy custom infrastructure). Compared to academic IR toolkits, Lucene can handle heterogeneous web collections at scale, but lacks systematic support for evaluation over standard test collections. This paper introduces Anserini, a new information retrieval toolkit that aims to provide the best of both worlds, to better align information retrieval practice and research. Anserini provides wrappers and extensions on top of core Lucene libraries that allow researchers to use more intuitive APIs to accomplish common research tasks. Our initial efforts have focused on three functionalities: scalable, multi-threaded inverted indexing to handle modern web-scale collections, streamlined IR evaluation for ad hoc retrieval on standard test collections, and an extensible architecture for multi-stage ranking. Anserini ships with support for many TREC test collections, providing a convenient way to replicate competitive baselines right out of the box. Experiments verify that our system is both efficient and effective, providing a solid foundation to support future research.
引用
收藏
页码:1253 / 1256
页数:4
相关论文
共 50 条
  • [1] Anserini Gets Dense Retrieval: Integration of Lucene's HNSW Indexes
    Ma, Xueguang
    Teofili, Tommaso
    Lin, Jimmy
    [J]. PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 5366 - 5370
  • [2] Anserini: Reproducible Ranking Baselines Using Lucene
    Yang, Peilin
    Fang, Hui
    Lin, Jimmy
    [J]. ACM JOURNAL OF DATA AND INFORMATION QUALITY, 2018, 10 (04):
  • [3] Solr Integration in the Anserini Information Retrieval Toolkit
    Clancy, Ryan
    Eskildsen, Toke
    Ruest, Nick
    Lin, Jimmy
    [J]. PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 1285 - 1288
  • [4] The Lucene for Information Access and Retrieval Research (LIARR) Workshop at SIGIR 2017
    Azzopardi, Leif
    Crane, Matt
    Fang, Hui
    Ingersoll, Grant
    Lin, Jimmy
    Moshfeghi, Yashar
    Scells, Harrisen
    Yang, Peilin
    Zuccon, Guido
    [J]. SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 1429 - 1430
  • [5] Information Retrieval Services Based on Lucene Architecture
    Li, Hang
    Li, Wanlong
    Wang, Guochun
    Peng, Xinyi
    [J]. INFORMATION COMPUTING AND APPLICATIONS, PT 1, 2012, 307 : 638 - 645
  • [6] The Study on Lucene Based IETM Information Retrieval
    Wu, Jiaju
    Liu, Zhenji
    Zhu, Xinglin
    Yu, Rong
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, INFORMATION MANAGEMENT AND NETWORK SECURITY, 2016, 47 : 221 - 224
  • [7] Research on Application of Lucene in Medical Image Retrieval System
    Cui, Wencheng
    Xu, Mengjia
    Sun, Huayu
    Shao, Hong
    [J]. 2011 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT), VOLS 1-4, 2012, : 661 - 664
  • [8] Efficient Information Retrieval using Lucene, LIndex and HIndex in Hadoop
    Mathew, Anita Brigit
    Pattnaik, Priyabrat
    Kumar, S. D. Madhu
    [J]. 2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 333 - 340
  • [9] Intelligent Retrieval Knowledge Repository Model Design Based on Lucene Research
    Luo, Gang
    Xu, Hong-feng
    [J]. 2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS, MACHINERY AND MATERIALS (IIMM 2015), 2015, : 91 - 94
  • [10] Internet Information Retrieval for Enabling Student Projects
    Mohamed, Nader
    Al-Jaroodi, Jameela
    Jawhar, Imad
    [J]. PROCEEDINGS OF THE 2009 SIXTH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: NEW GENERATIONS, VOLS 1-3, 2009, : 987 - 992