Retrieval with gene queries

被引：14

作者：

Sehgal, Aditya K. ^{[1
]}

Srinivasan, Padmini

机构：

[1] Univ Iowa, Dept Comp Sci, Iowa City, IA 52246 USA

[2] Univ Iowa, Sch Lib & Informat Sci, Iowa City, IA 52246 USA

来源：

BMC BIOINFORMATICS | 2006年 / 7卷 / 1期

关键词：

D O I：

10.1186/1471-2105-7-220

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Background: Accuracy of document retrieval from MEDLINE for gene queries is crucially important for many applications in bioinformatics. We explore five information retrieval-based methods to rank documents retrieved by PubMed gene queries for the human genome. The aim is to rank relevant documents higher in the retrieved list. We address the special challenges faced due to ambiguity in gene nomenclature: gene terms that refer to multiple genes, gene terms that are also English words, and gene terms that have other biological meanings. Results: Our two baseline ranking strategies are quite similar in performance. Two of our three LocusLink-based strategies offer significant improvements. These methods work very well even when there is ambiguity in the gene terms. Our best ranking strategy offers significant improvements on three different kinds of ambiguities over our two baseline strategies (improvements range from 15.9% to 17.7% and 11.7% to 13.3% depending on the baseline). For most genes the best ranking query is one that is built from the LocusLink (now Entrez Gene) summary and product information along with the gene names and aliases. For others, the gene names and aliases suffice. We also present an approach that successfully predicts, for a given gene, which of these two ranking queries is more appropriate. Conclusion: We explore the effect of different post-retrieval strategies on the ranking of documents returned by PubMed for human gene queries. We have successfully applied some of these strategies to improve the ranking of relevant documents in the retrieved sets. This holds true even when various kinds of ambiguity are encountered. We feel that it would be very useful to apply strategies like ours on PubMed search results as these are not ordered by relevance in any way. This is especially so for queries that retrieve a large number of documents.

引用

页数：12

共 50 条

[1] Retrieval with gene queries
Aditya K Sehgal
Padmini Srinivasan
[J]. BMC Bioinformatics, 7
[2] Information Retrieval with Verbose Queries
Gupta, Manish
Bendersky, Michael
[J]. FOUNDATIONS AND TRENDS IN INFORMATION RETRIEVAL, 2015, 9 (3-4): : 209 - 354
[3] Image retrieval by partial queries
Grecu, H
Lambert, P
[J]. 2001 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2001, : 26 - 29
[4] Image retrieval in multipoint queries
Vu, Khanh
Cheng, Hao
Hua, Kien A.
[J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2008, 18 (2-3) : 170 - 181
[5] Information Retrieval with Verbose Queries
Gupta, Manish
Bendersky, Michael
[J]. SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, : 1121 - 1124
[6] Corrupted Queries in Text Retrieval
Otero Pombo, Juan
Vilares Ferro, Jesus
Vilares Ferro, Manuel
[J]. PROCESAMIENTO DEL LENGUAJE NATURAL, 2009, (42): : 9 - 16
[7] Image retrieval with local and spatial queries
Moghaddam, B
Biermann, H
Margaritis, D
[J]. 2000 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2000, : 542 - 545
[8] Information Retrieval from Database Queries
Catao, Vladimir Soares
Sampaio, Marcus Costa
Schiel, Ulrich
[J]. 2014 IEEE/ACS 11TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2014, : 507 - 514
[9] A Study of Retrieval Models for Long Documents and Queries in Information Retrieval
Cummins, Ronan
[J]. PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, : 795 - 805
[10] Enabling soft queries for data retrieval
Yu, Hwanjo
Hwang, Seung-Won
Chang, Kevin Chen-Chuan
[J]. INFORMATION SYSTEMS, 2007, 32 (04) : 560 - 574

← 1 2 3 4 5 →