PaperBLAST: Text Mining Papers for Information about Homologs

被引:73
|
作者
Price, Morgan N. [1 ]
Arkin, Adam P. [1 ]
机构
[1] Lawrence Berkeley Natl Lab, Environm Genom & Syst Biol, Berkeley, CA 94720 USA
关键词
annotation; text mining; PROTEIN FUNCTION; DATABASE; GENOMES; IDENTIFICATION; RESOURCE; SEARCH; GENES; KEGG;
D O I
10.1128/mSystems.00039-17
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
Large-scale genome sequencing has identified millions of protein-coding genes whose function is unknown. Many of these proteins are similar to characterized proteins from other organisms, but much of this information is missing from annotation databases and is hidden in the scientific literature. To make this information accessible, PaperBLAST uses EuropePMC to search the full text of scientific articles for references to genes. PaperBLAST also takes advantage of curated resources (Swiss-Prot, GeneRIF, and EcoCyc) that link protein sequences to scientific articles. PaperBLAST's database includes over 700,000 scientific articles that mention over 400,000 different proteins. Given a protein of interest, PaperBLAST quickly finds similar proteins that are discussed in the literature and presents snippets of text from relevant articles or from the curators. PaperBLAST is available at http://papers.genomics.lbl.gov/. IMPORTANCE With the recent explosion of genome sequencing data, there are now millions of uncharacterized proteins. If a scientist becomes interested in one of these proteins, it can be very difficult to find information as to its likely function. Often a protein whose sequence is similar, and which is likely to have a similar function, has been studied already, but this information is not available in any database. To help find articles about similar proteins, PaperBLAST searches the full text of scientific articles for protein identifiers or gene identifiers, and it links these articles to protein sequences. Then, given a protein of interest, it can quickly find similar proteins in its database by using standard software (BLAST), and it can show snippets of text from relevant papers. We hope that PaperBLAST will make it easier for biologists to predict proteins' functions.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Using text mining to retrieve information about circular economy
    Spreafico, Christian
    Spreafico, Matteo
    [J]. COMPUTERS IN INDUSTRY, 2021, 132
  • [2] Text mining tools for extracting information about microbial biodiversity in food
    Chaix, Estelle
    Deleger, Louise
    Bossy, Robert
    Nedellec, Claire
    [J]. FOOD MICROBIOLOGY, 2019, 81 : 63 - 75
  • [3] Measuring science and innovation linkage using text mining of research papers and patent information
    Motohashi, Kazuyuki
    Koshiba, Hitoshi
    Ikeuchi, Kenta
    [J]. SCIENTOMETRICS, 2024, 129 (04) : 2159 - 2179
  • [4] Measuring science and innovation linkage using text mining of research papers and patent information
    Kazuyuki Motohashi
    Hitoshi Koshiba
    Kenta Ikeuchi
    [J]. Scientometrics, 2024, 129 : 2159 - 2179
  • [5] Trends recognition in journal papers by text mining
    Terachi, Masahiro
    Saga, Ryosuke
    Tsuji, Hiroshi
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 4784 - +
  • [6] A TEXT MINING APPROACH EXPLORING ACKNOWLEDGEMENTS OF PAPERS
    Diaz-Faes, Adrian A.
    Bordons, Maria
    [J]. 14TH INTERNATIONAL SOCIETY OF SCIENTOMETRICS AND INFORMETRICS CONFERENCE (ISSI), 2013, : 2162 - 2164
  • [7] Text mining and information retrieval
    Forest, Dominic
    Da Sylva, Lyne
    [J]. CANADIAN JOURNAL OF INFORMATION AND LIBRARY SCIENCE-REVUE CANADIENNE DES SCIENCES DE L INFORMATION ET DE BIBLIOTHECONOMIE, 2011, 35 (03): : 217 - 227
  • [8] Elsevier opens its papers to text-mining
    Van Noorden, Richard
    [J]. NATURE, 2014, 506 (7486) : 17 - 17
  • [9] Elsevier opens its papers to text-mining
    Richard Van Noorden
    [J]. Nature, 2014, 506 : 17 - 17
  • [10] Text Mining Metal-Organic Framework Papers
    Park, Sanghoon
    Kim, Baekjun
    Choi, Sihoon
    Boyd, Peter G.
    Smit, Berend
    Kim, Jihan
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2018, 58 (02) : 244 - 251