Retrieval with gene queries

被引:14
|
作者
Sehgal, Aditya K. [1 ]
Srinivasan, Padmini
机构
[1] Univ Iowa, Dept Comp Sci, Iowa City, IA 52246 USA
[2] Univ Iowa, Sch Lib & Informat Sci, Iowa City, IA 52246 USA
关键词
D O I
10.1186/1471-2105-7-220
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Accuracy of document retrieval from MEDLINE for gene queries is crucially important for many applications in bioinformatics. We explore five information retrieval-based methods to rank documents retrieved by PubMed gene queries for the human genome. The aim is to rank relevant documents higher in the retrieved list. We address the special challenges faced due to ambiguity in gene nomenclature: gene terms that refer to multiple genes, gene terms that are also English words, and gene terms that have other biological meanings. Results: Our two baseline ranking strategies are quite similar in performance. Two of our three LocusLink-based strategies offer significant improvements. These methods work very well even when there is ambiguity in the gene terms. Our best ranking strategy offers significant improvements on three different kinds of ambiguities over our two baseline strategies (improvements range from 15.9% to 17.7% and 11.7% to 13.3% depending on the baseline). For most genes the best ranking query is one that is built from the LocusLink (now Entrez Gene) summary and product information along with the gene names and aliases. For others, the gene names and aliases suffice. We also present an approach that successfully predicts, for a given gene, which of these two ranking queries is more appropriate. Conclusion: We explore the effect of different post-retrieval strategies on the ranking of documents returned by PubMed for human gene queries. We have successfully applied some of these strategies to improve the ranking of relevant documents in the retrieved sets. This holds true even when various kinds of ambiguity are encountered. We feel that it would be very useful to apply strategies like ours on PubMed search results as these are not ordered by relevance in any way. This is especially so for queries that retrieve a large number of documents.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Retrieval with gene queries
    Aditya K Sehgal
    Padmini Srinivasan
    [J]. BMC Bioinformatics, 7
  • [2] Information Retrieval with Verbose Queries
    Gupta, Manish
    Bendersky, Michael
    [J]. FOUNDATIONS AND TRENDS IN INFORMATION RETRIEVAL, 2015, 9 (3-4): : 209 - 354
  • [3] Image retrieval by partial queries
    Grecu, H
    Lambert, P
    [J]. 2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2001, : 26 - 29
  • [4] Image retrieval in multipoint queries
    Vu, Khanh
    Cheng, Hao
    Hua, Kien A.
    [J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2008, 18 (2-3) : 170 - 181
  • [5] Information Retrieval with Verbose Queries
    Gupta, Manish
    Bendersky, Michael
    [J]. SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 1121 - 1124
  • [6] Corrupted Queries in Text Retrieval
    Otero Pombo, Juan
    Vilares Ferro, Jesus
    Vilares Ferro, Manuel
    [J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (42): : 9 - 16
  • [7] Image retrieval with local and spatial queries
    Moghaddam, B
    Biermann, H
    Margaritis, D
    [J]. 2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2000, : 542 - 545
  • [8] Information Retrieval from Database Queries
    Catao, Vladimir Soares
    Sampaio, Marcus Costa
    Schiel, Ulrich
    [J]. 2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 507 - 514
  • [9] A Study of Retrieval Models for Long Documents and Queries in Information Retrieval
    Cummins, Ronan
    [J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, : 795 - 805
  • [10] Enabling soft queries for data retrieval
    Yu, Hwanjo
    Hwang, Seung-Won
    Chang, Kevin Chen-Chuan
    [J]. INFORMATION SYSTEMS, 2007, 32 (04) : 560 - 574