Retrieval with gene queries

被引:14
|
作者
Sehgal, Aditya K. [1 ]
Srinivasan, Padmini
机构
[1] Univ Iowa, Dept Comp Sci, Iowa City, IA 52246 USA
[2] Univ Iowa, Sch Lib & Informat Sci, Iowa City, IA 52246 USA
关键词
D O I
10.1186/1471-2105-7-220
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Accuracy of document retrieval from MEDLINE for gene queries is crucially important for many applications in bioinformatics. We explore five information retrieval-based methods to rank documents retrieved by PubMed gene queries for the human genome. The aim is to rank relevant documents higher in the retrieved list. We address the special challenges faced due to ambiguity in gene nomenclature: gene terms that refer to multiple genes, gene terms that are also English words, and gene terms that have other biological meanings. Results: Our two baseline ranking strategies are quite similar in performance. Two of our three LocusLink-based strategies offer significant improvements. These methods work very well even when there is ambiguity in the gene terms. Our best ranking strategy offers significant improvements on three different kinds of ambiguities over our two baseline strategies (improvements range from 15.9% to 17.7% and 11.7% to 13.3% depending on the baseline). For most genes the best ranking query is one that is built from the LocusLink (now Entrez Gene) summary and product information along with the gene names and aliases. For others, the gene names and aliases suffice. We also present an approach that successfully predicts, for a given gene, which of these two ranking queries is more appropriate. Conclusion: We explore the effect of different post-retrieval strategies on the ranking of documents returned by PubMed for human gene queries. We have successfully applied some of these strategies to improve the ranking of relevant documents in the retrieved sets. This holds true even when various kinds of ambiguity are encountered. We feel that it would be very useful to apply strategies like ours on PubMed search results as these are not ordered by relevance in any way. This is especially so for queries that retrieve a large number of documents.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Relaxing XML Preference Queries for Cooperative Retrieval
    Cho, SungRan
    Balke, Wolf-Tilo
    [J]. ENTERPRISE INFORMATION SYSTEMS-BK, 2009, 24 : 160 - 171
  • [32] Modeling Queries with Contextual Snippets for Information Retrieval
    Chen, Qin
    Hu, Qinmin
    Huang, Jimmy Xiangji
    He, Liang
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2018, 9 (04)
  • [33] String Retrieval for Multi-pattern Queries
    Hon, Wing-Kai
    Shah, Rahul
    Thankachan, Sharma V.
    Vitter, Jeffrey Scott
    [J]. STRING PROCESSING AND INFORMATION RETRIEVAL, 2010, 6393 : 55 - +
  • [34] Image database retrieval using sketched queries
    Chalechale, A
    Naghdy, G
    Premaratne, P
    [J]. ICIP: 2004 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1- 5, 2004, : 433 - 436
  • [35] On private information retrieval supporting range queries
    Hayata, Junichiro
    Schuldt, Jacob C. N.
    Hanaoka, Goichiro
    Matsuura, Kanta
    [J]. INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2024, 23 (01) : 629 - 647
  • [36] Drum Loops Retrieval from Spoken Queries
    Olivier Gillet
    Gaël Richard
    [J]. Journal of Intelligent Information Systems, 2005, 24 : 159 - 177
  • [37] Adapting information retrieval systems to user queries
    Kumaran, Giridhar
    Allan, James
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2008, 44 (06) : 1838 - 1862
  • [38] Coding of sung queries for music information retrieval
    Adams, NH
    Bartsch, MA
    Wakefield, GH
    [J]. 2003 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS PROCEEDINGS, 2003, : 139 - 142
  • [39] The retrieval effectiveness of search engines on navigational queries
    Lewandowski, Dirk
    [J]. ASLIB PROCEEDINGS, 2011, 63 (04): : 354 - 363
  • [40] Evaluating chinese text retrieval with multilingual queries
    Chen, KH
    [J]. KNOWLEDGE ORGANIZATION, 2002, 29 (03): : 156 - 170