Anserini: Enabling the Use of Lucene for Information Retrieval Research

被引:165
|
作者
Yang, Peilin [1 ]
Fang, Hui [1 ]
Lin, Jimmy [2 ]
机构
[1] Univ Delaware, Dept Elect & Comp Engn, Newark, DE 19716 USA
[2] Univ Waterloo, David R Cheriton Sch Comp Sci, Waterloo, ON, Canada
基金
美国国家科学基金会; 加拿大自然科学与工程研究理事会;
关键词
D O I
10.1145/3077136.3080721
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software toolkits play an essential role in information retrieval research. Most open-source toolkits developed by academics are designed to facilitate the evaluation of retrieval models over standard test collections. Efforts are generally directed toward better ranking and less attention is usually given to scalability and other operational considerations. On the other hand, Lucene has become the de facto platform in industry for building search applications (outside a small number of companies that deploy custom infrastructure). Compared to academic IR toolkits, Lucene can handle heterogeneous web collections at scale, but lacks systematic support for evaluation over standard test collections. This paper introduces Anserini, a new information retrieval toolkit that aims to provide the best of both worlds, to better align information retrieval practice and research. Anserini provides wrappers and extensions on top of core Lucene libraries that allow researchers to use more intuitive APIs to accomplish common research tasks. Our initial efforts have focused on three functionalities: scalable, multi-threaded inverted indexing to handle modern web-scale collections, streamlined IR evaluation for ad hoc retrieval on standard test collections, and an extensible architecture for multi-stage ranking. Anserini ships with support for many TREC test collections, providing a convenient way to replicate competitive baselines right out of the box. Experiments verify that our system is both efficient and effective, providing a solid foundation to support future research.
引用
收藏
页码:1253 / 1256
页数:4
相关论文
共 50 条
  • [31] Information retrieval: A Growing Area of research
    Bordignon, Fernando R. A.
    Tolosa, Gabriel H.
    [J]. TELEMATIQUE, 2007, 6 (01):
  • [32] Applied informetrics for information retrieval research
    Tang, R
    [J]. PORTAL-LIBRARIES AND THE ACADEMY, 2004, 4 (03) : 431 - 432
  • [33] An agenda for green information retrieval research
    Chowdhury, Gobinda
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2012, 48 (06) : 1067 - 1077
  • [34] RESEARCH AND DEVELOPMENT IN INFORMATION-RETRIEVAL
    不详
    [J]. INFORMATION & MANAGEMENT, 1985, 8 (01) : 53 - 54
  • [35] Research on Similarity for XML Information Retrieval
    Ren Xueli
    Dai Yubiao
    [J]. PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT, COMPUTER AND SOCIETY, 2016, 37 : 1897 - 1901
  • [36] INFORMATION RESEARCH - RETRIEVAL MADE EASY
    BHAGAT, NA
    BARUCH, JJ
    GANNETT, EK
    [J]. IEEE SPECTRUM, 1975, 12 (02) : 61 - 62
  • [37] Strategies for enabling the use of research evidence
    Gough, David
    Boaz, Annette
    [J]. EVIDENCE & POLICY, 2014, 10 (01): : 3 - +
  • [38] INFORMATION-RETRIEVAL RESEARCH AND ESPRIT
    SMEATON, AF
    [J]. JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE, 1987, 38 (01): : 21 - 22
  • [39] Informetric applications for information retrieval research
    Wolfram, D
    Ajiferuke, I
    Downie, JS
    Nelson, MJ
    Zhang, J
    [J]. ASIST 2002: PROCEEDINGS OF THE 65TH ASIST ANNUAL MEETING, VOL 39, 2002, 2002, 39 : 473 - 474
  • [40] Cooperative research in information retrieval systems
    David, AA
    [J]. CARI'96 - PROCEEDINGS OF THE 3RD AFRICAN CONFERENCE ON RESEARCH IN COMPUTER SCIENCE, 1996, : 217 - 226