Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data

被引:44
|
作者
Matthews, Beverley B. [1 ]
dos Santos, Gilberto [1 ]
Crosby, Madeline A. [1 ]
Emmert, David B. [1 ]
St Pierre, Susan E. [1 ]
Gramates, L. Sian [1 ]
Zhou, Pinglei [1 ]
Schroeder, Andrew J. [1 ]
Falls, Kathleen [1 ]
Strelets, Victor [2 ]
Russo, Susan M. [1 ]
Gelbart, William M. [1 ]
机构
[1] Harvard Univ, Dept Mol & Cellular Biol, Cambridge, MA 02138 USA
[2] Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
[3] Univ New Mexico, Dept Biol, Albuquerque, NM 87131 USA
来源
G3-GENES GENOMES GENETICS | 2015年 / 5卷 / 08期
基金
英国医学研究理事会; 美国国家卫生研究院;
关键词
transcriptome; alternative splice; IncRNA; transcription start site; exon junction; OPEN READING FRAMES; POLYCISTRONIC MESSENGER-RNA; MOLECULAR EVOLUTION; REFERENCE SEQUENCE; GENOME ANNOTATION; ENDOGENOUS SIRNAS; IDENTIFICATION; REVEALS; EXPRESSION; TRANSCRIPTS;
D O I
10.1534/g3.115.018929
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
We report the current status of the FlyBase annotated gene set for Drosophila melanogaster and highlight improvements based on high-throughput data. The FlyBase annotated gene set consists entirely of manually annotated gene models, with the exception of some classes of small non-coding RNAs. All gene models have been reviewed using evidence from high-throughput datasets, primarily from the modENCODE project. These datasets include RNA-Seq coverage data, RNA-Seq junction data, transcription start site profiles, and translation stop-codon read-through predictions. New annotation guidelines were developed to take into account the use of the high-throughput data. We describe how this flood of new data was incorporated into thousands of new and revised annotations. FlyBase has adopted a philosophy of excluding low-confidence and low-frequency data from gene model annotations; we also do not attempt to represent all possible permutations for complex and modularly organized genes. This has allowed us to produce a high-confidence, manageable gene annotation dataset that is available at FlyBase (http://flybase.org). Interesting aspects of new annotations include new genes (coding, non-coding, and antisense), many genes with alternative transcripts with very long 39 UTRs (up to 15-18 kb), and a stunning mismatch in the number of male-specific genes (approximately 13% of all annotated genemodels) vs. female-specific genes (less than 1%). The number of identified pseudogenes and mutations in the sequenced strain also increased significantly. We discuss remaining challenges, for instance, identification of functional small polypeptides and detection of alternative translation starts.
引用
收藏
页码:1721 / 1736
页数:16
相关论文
共 50 条
  • [41] A High-Throughput Method for Quantifying Drosophila Fecundity
    Gomez, Andreana
    Gonzalez, Sergio
    Oke, Ashwini
    Luo, Jiayu
    Duong, Johnny B.
    Esquerra, Raymond M.
    Zimmerman, Thomas
    Capponi, Sara
    Fung, Jennifer C.
    Nystul, Todd G.
    TOXICS, 2024, 12 (09)
  • [42] HIGH-THROUGHPUT AND NON-INVASIVE FUNCTIONAL DRUG SCREENING PLATFORM FOR DROSOPHILA MELANOGASTER MODELS OF NEPHROLITHIASIS
    Ali, Sohrab Naushad
    Kim, Jihye
    Spagnuolo, Paul
    Razvi, Hassan
    Leong, Hon
    JOURNAL OF UROLOGY, 2016, 195 (04): : E884 - E884
  • [43] High-throughput sample handling and data collection at synchrotrons: embedding the ESRF into the high-throughput gene-to-structure pipeline
    Beteva, A.
    Cipriani, F.
    Cusack, S.
    Delageniere, S.
    Gabadinho, J.
    Gordon, E. J.
    Guijarro, M.
    Hall, D. R.
    Larsen, S.
    Launer, L.
    Lavault, C. B.
    Leonard, G. A.
    Mairs, T.
    McCarthy, A.
    McCarthy, J.
    Meyer, J.
    Mitchell, E.
    Monaco, S.
    Nurizzo, D.
    Pernot, P.
    Pieritz, R.
    Ravelli, R. G. B.
    Rey, V.
    Shepard, W.
    Spruce, D.
    Stuart, D. I.
    Svensson, O.
    Theveneau, P.
    Thibault, X.
    Turkenburg, J.
    Walsh, M.
    McSweeney, S. M.
    ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY, 2006, 62 : 1162 - 1169
  • [44] nEASE: a method for gene ontology subclassification of high-throughput gene expression data
    Chittenden, Thomas W.
    Howe, Eleanor A.
    Taylor, Jennifer M.
    Mar, Jessica C.
    Aryee, Martin J.
    Gomez, Harold
    Sultana, Razvan
    Braisted, John
    Nair, Sarita J.
    Quackenbush, John
    Holmes, Chris
    BIOINFORMATICS, 2012, 28 (05) : 726 - 728
  • [45] High throughput screening of VSD candidate genes with the help of powerful model Drosophila melanogaster
    von der Decken, Isabel
    Gutierrez, Daniel Rodriguez
    Sotillos, Sol
    Sprecher, Simon
    Biason-Lauber, Anna
    SEXUAL DEVELOPMENT, 2024, 17 : 31 - 31
  • [46] High throughput screening of DSD candidate genes with the help of the powerful model Drosophila melanogaster
    von der Decken, Isabel
    Rodriguez Gutierrez, Daniel
    Sotillos, Sol
    Castelli-Gair Hombria, James
    Sprecher, Simon
    Lauber, Anna
    HORMONE RESEARCH IN PAEDIATRICS, 2021, 94 (SUPPL 1): : 391 - 391
  • [47] Model based heritability scores for high-throughput sequencing data
    Pratyaydipta Rudra
    W. Jenny Shi
    Brian Vestal
    Pamela H. Russell
    Aaron Odell
    Robin D. Dowell
    Richard A. Radcliffe
    Laura M. Saba
    Katerina Kechris
    BMC Bioinformatics, 18
  • [48] BioAssay Ontology Annotations Facilitate Cross-Analysis of Diverse High-Throughput Screening Data Sets
    Schuerer, Stephan C.
    Vempati, Uma
    Smith, Robin
    Southern, Mark
    Lemmon, Vance
    JOURNAL OF BIOMOLECULAR SCREENING, 2011, 16 (04) : 415 - 426
  • [49] Model based heritability scores for high-throughput sequencing data
    Rudra, Pratyaydipta
    Shi, W. Jenny
    Vestal, Brian
    Russell, Pamela H.
    Odell, Aaron
    Dowell, Robin D.
    Radcliffe, Richard A.
    Saba, Laura M.
    Kechris, Katerina
    BMC BIOINFORMATICS, 2017, 18
  • [50] Clinical Impact of High-Throughput Gene Expression Studies in Lung Cancer
    Beane, Jennifer
    Spira, Avrum
    Lenburg, Marc E.
    JOURNAL OF THORACIC ONCOLOGY, 2009, 4 (01) : 109 - 118