LETOR: A benchmark collection for research on learning to rank for information retrieval

被引:251
|
作者
Qin, Tao [1 ]
Liu, Tie-Yan [1 ]
Xu, Jun [1 ]
Li, Hang [1 ]
机构
[1] Microsoft Res Asia, Beijing, Peoples R China
来源
INFORMATION RETRIEVAL | 2010年 / 13卷 / 04期
关键词
Learning to rank; Information retrieval; Benchmark datasets; Feature extraction;
D O I
10.1007/s10791-009-9123-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
LETOR is a benchmark collection for the research on learning to rank for information retrieval, released by Microsoft Research Asia. In this paper, we describe the details of the LETOR collection and show how it can be used in different kinds of researches. Specifically, we describe how the document corpora and query sets in LETOR are selected, how the documents are sampled, how the learning features and meta information are extracted, and how the datasets are partitioned for comprehensive evaluation. We then compare several state-of-the-art learning to rank algorithms on LETOR, report their ranking performances, and make discussions on the results. After that, we discuss possible new research topics that can be supported by LETOR, in addition to algorithm comparison. We hope that this paper can help people to gain deeper understanding of LETOR, and enable more interesting research projects on learning to rank and related topics.
引用
收藏
页码:346 / 374
页数:29
相关论文
共 50 条
  • [41] Learning to Rank for Mathematical Formula Retrieval
    Mansouri, Behrooz
    Zanibbi, Richard
    Oard, Douglas W.
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 952 - 961
  • [42] The Research on Information Collection and Retrieval Technology of Big Data based Cloud Computing
    Chen, Hongjun
    Huang, Kun
    PROCEEDINGS OF THE 1ST INTERNATIONAL WORKSHOP ON CLOUD COMPUTING AND INFORMATION SECURITY (CCIS 2013), 2013, 52 : 437 - 440
  • [43] Partial collection replication for information retrieval
    Lu, ZH
    McKinley, KS
    INFORMATION RETRIEVAL, 2003, 6 (02): : 159 - 198
  • [44] Partial Collection Replication for Information Retrieval
    Zhihong Lu
    Kathryn S. McKinley
    Information Retrieval, 2003, 6 : 159 - 198
  • [45] Extracting Emotion Causes Using Learning to Rank Methods From an Information Retrieval Perspective
    Xu, Bo
    Lin, Hongfei
    Lin, Yuan
    Diao, Yufeng
    Yang, Liang
    Xu, Kan
    IEEE ACCESS, 2019, 7 : 15573 - 15583
  • [46] Using Learning to Rank Approach for Parallel Corpora Based Cross Language Information Retrieval
    Azarbonyad, Hosein
    Shakery, Azadeh
    Faili, Heshaam
    20TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE (ECAI 2012), 2012, 242 : 79 - 84
  • [47] Learning to Rank for Information Retrieval Using Layered Multi-Population Genetic Programming
    Lin, Jung Yi
    Yeh, Jen-Yuan
    Liu, Chao Chung
    2012 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND CYBERNETICS (CYBERNETICSCOM), 2012, : 45 - 49
  • [48] A protein classification benchmark collection for machine learning
    Sonego, Paolo
    Pacurar, Mircea
    Dhir, Somdutta
    Kertesz-Farkas, Attila
    Kocsor, Andras
    Gaspari, Zoltan
    Leunissen, Jack A. M.
    Pongor, Sandor
    NUCLEIC ACIDS RESEARCH, 2007, 35 : D232 - D236
  • [49] Weighted Rank Correlation in Information Retrieval Evaluation
    Melucci, Massimo
    INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2009, 5839 : 75 - 86
  • [50] Rank by Readability: Document Weighting for Information Retrieval
    Newbold, Neil
    McLaughlin, Harry
    Gillam, Lee
    ADVANCES IN MULTIDISCIPLINARY RETRIEVAL, 2010, 6107 : 20 - 30