Reranking candidate gene models with cross-species comparison for improved gene prediction

被引:5
|
作者
Liu, Qian [1 ]
Crammer, Koby [1 ]
Pereira, Fernando C. N. [2 ]
Roos, David S. [3 ]
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[2] Google Inc, Mountain View, CA USA
[3] Univ Penn, Dept Biol, Philadelphia, PA 19104 USA
关键词
D O I
10.1186/1471-2105-9-433
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results: We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion: Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Reranking candidate gene models with cross-species comparison for improved gene prediction
    Qian Liu
    Koby Crammer
    Fernando CN Pereira
    David S Roos
    [J]. BMC Bioinformatics, 9
  • [2] AGenDA: gene prediction by cross-species sequence comparison
    Taher, L
    Rinner, O
    Garg, S
    Sczyrba, A
    Morgenstern, B
    [J]. NUCLEIC ACIDS RESEARCH, 2004, 32 : W305 - W308
  • [3] On gene prediction by cross-species comparative sequence analysis
    Chen, R
    Ali, H
    [J]. PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, : 446 - 447
  • [4] Identification and cross-species comparison of osteoarthritic gene promoter motifs
    Hannenhalli, S
    Middleton, RP
    Levy, S
    Perroud, B
    Holzwarth, J
    McDonald, K
    Hannah, SS
    [J]. OSTEOARTHRITIS AND CARTILAGE, 2004, 12 : S59 - S59
  • [5] Cross-species gene normalization by species inference
    Wei, Chih-Hsuan
    Kao, Hung-Yu
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [6] Cross-species gene normalization by species inference
    Chih-Hsuan Wei
    Hung-Yu Kao
    [J]. BMC Bioinformatics, 12
  • [7] Cross-species interference of gene expression
    de Bruijn, Irene
    Verhoeven, Koen J. F.
    [J]. NATURE COMMUNICATIONS, 2018, 9
  • [8] Cross-species interference of gene expression
    Irene de Bruijn
    Koen J. F. Verhoeven
    [J]. Nature Communications, 9
  • [9] TEMPORAL GRAPHICAL MODELS FOR CROSS-SPECIES GENE REGULATORY NETWORK DISCOVERY
    Liu, Yan
    Niculescu-Mizil, Alexandru
    Lozano, Aurelie
    Lu, Yong
    [J]. JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, 2011, 9 (02) : 231 - 250
  • [10] Identifying novel candidate coloboma genes through cross-species gene expression profiling
    Sanchez-Mendoza, Daniel
    Neelathi, Uma
    McGaughey, David
    Boobalan, Elangovan
    Cogliati, Tiziana
    Brooks, Brian
    [J]. INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2023, 64 (08)