A semiautomated approach to gene discovery through expressed sequence tag data mining: Discovery of new human transporter genes

被引:13
|
作者
Brown, S
Chang, JL
Sadee, W
Babbitt, PC
机构
[1] Univ Calif San Francisco, Sch Pharm, Dept Pharmaceut Chem, San Francisco, CA 94143 USA
[2] Univ Calif San Francisco, Sch Pharm, Dept Biopharmaceut Sci, San Francisco, CA 94143 USA
[3] MIT, Whitehead Inst, Ctr Genome Res, Cambridge, MA 02141 USA
[4] Ohio State Univ, Med Ctr, Columbus, OH 43210 USA
来源
AAPS PHARMSCI | 2003年 / 5卷 / 01期
关键词
major facilitator superfamily; transporters; superfamily analysis; expressed sequence tags; data mining;
D O I
10.1208/ps050101
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
Identification and functional characterization of the genes in the human genome remain a major challenge. A principal source of publicly available information used for this purpose is the National Center for Biotechnology Information database of expressed sequence tags (dbEST), which contains over 4 million human ESTs. To extract the information buried in this data more effectively, we have developed a semiautomated method to mine dbEST for uncharacterized human genes. Starting with a single protein input sequence, a family of related proteins from all species is compiled. This entire family is then used to mine the human EST database for new gene candidates. Evaluation of putative new gene candidates in the context of a family of characterized proteins provides a framework for inference of the structure and function of the new genes. When applied to a test data set of 28 families within the major facilitator superfamily (MFS) of membrane transporters, our protocol found 73 previously characterized human MFS genes and 43 new MFS gene candidates. Development of this approach provided insights into the problems and pitfalls of automated data mining using public databases.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] A new data mining approach for the discovery of critical group mobile routes
    Tsiligaridis, J
    Acharya, R
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL VIII, PROCEEDINGS: CONTROL, COMMUNICATION AND NETWORK SYSTEMS, TECHNOLOGIES AND APPLICATIONS, 2004, : 17 - 22
  • [22] A comprehensive approach to clustering of expressed human gene sequence: The sequence tag alignment and consensus knowledge base
    Miller, RT
    Christoffels, AG
    Gopalakrishnan, C
    Burke, J
    Ptitsyn, AA
    Broveak, TR
    Hide, WA
    GENOME RESEARCH, 1999, 9 (11) : 1143 - 1155
  • [23] Gene discovery through expressed sequence tags generated from the spider Uroctea lesserti schenkel
    Choi, KH
    Goo, TW
    Yun, EY
    Hwang, JS
    Hong, SM
    Kim, NS
    Kang, SW
    KOREAN JOURNAL OF GENETICS, 2004, 26 (03): : 221 - 226
  • [24] Mining gene-sample-time microarray data: a coherent gene cluster discovery approach
    Jiang, Daxin
    Pei, Jian
    Ramanathan, Murali
    Lin, Chuan
    Tang, Chun
    Zhang, Aidong
    KNOWLEDGE AND INFORMATION SYSTEMS, 2007, 13 (03) : 305 - 335
  • [25] Discovery of genes related to steroidal alkaloid biosynthesis in Fritillaria cirrhosa by generating and mining a dataset of expressed sequence tags (ESTs)
    Sun, Chao
    Sun, Yongqiao
    Song, Jingyuan
    Li, Chenji
    Li, Xiwen
    Zhang, Xiaowei
    Li, Ying
    Hu, Songnian
    Luo, Hongmei
    Zhu, Yingjie
    Chen, Shilin
    JOURNAL OF MEDICINAL PLANTS RESEARCH, 2011, 5 (21): : 5307 - 5314
  • [26] Characterization of the human kallikrein gene locus and discovery of six new genes.
    Yousef, GM
    Diamandis, EP
    CLINICAL CHEMISTRY, 2000, 46 (06) : A170 - A170
  • [27] Bootstrapping of gene-expression data improves and controls the false discovery rate of differentially expressed genes
    Theo HE Meuwissen
    Mike E Goddard
    Genetics Selection Evolution, 36 (2)
  • [28] Bootstrapping of gene-expression data improves and controls the false discovery rate of differentially expressed genes
    Meuwissen, THE
    Goddard, ME
    GENETICS SELECTION EVOLUTION, 2004, 36 (02) : 191 - 205
  • [29] Mining for novel genes within key cardiovascular control regions of the brain using an expressed sequence tag (EST)-based approach
    Lindley, TE
    Rehmann, J
    Bonaldo, MF
    Soares, MB
    Davisson, RL
    FASEB JOURNAL, 2002, 16 (04): : A415 - A415
  • [30] Integration of Expressed Sequence Tag Data Flanking Predicted RNA Secondary Structures Facilitates Novel Non-Coding RNA Discovery
    Krzyzanowski, Paul M.
    Price, Feodor D.
    Muro, Enrique M.
    Rudnicki, Michael A.
    Andrade-Navarro, Miguel A.
    PLOS ONE, 2011, 6 (06):