Computational prediction of eukaryotic protein-coding genes

被引:0
|
作者
Michael Q. Zhang
机构
[1] Watson School of Biological Sciences,
[2] Cold Spring Harbor Laboratory,undefined
来源
Nature Reviews Genetics | 2002年 / 3卷
关键词
D O I
暂无
中图分类号
学科分类号
摘要
With the recent explosion in the availability of genome data, gene-finding programs have proliferated. However, the accuracy with which genes can be predicted is still far from satisfactory. This review provides background information and surveys the latest developments in gene-prediction programs. It also highlights the problems that face the gene-prediction field and discusses future research goals. The main characteristic of a eukaryotic gene is its organization into exons and introns. The 'exon-definition' model explains how the splicing machinery recognizes exons in a sea of intronic DNA. It indicates that an internal exon is initially recognized by a chain of interacting splicing factors that span it. The binding of these factors to pre-mRNA is responsible for the non-random nucleotide patterns that form the molecular basis of all exon-recognition algorithms. Correctly identifying the boundaries of a gene is essential when searching for several genes in a large genomic region. It is relatively easy to find internal exons, but many gene-prediction programs fail to identify gene boundaries. Determining the 3′ end of a gene is easier than determining its 5′ end, mainly because of the difficulty of identifying the promoter and transcriptional start-site sequences, and because the 5′ ends of cDNA sequences are often truncated. As current gene-prediction programs are biased towards intron-containing genes, many intronless genes might have been missed by such programs. Many false-positive exon predictions have also been caused by pseudogenes. Developing better and more specialized algorithms to recognize them is becoming increasingly important. Hidden Markov model (HMM)-based programs can be used to predict multiple genes, partial genes and genes on both strands, all at the same time. These features are essential when annotating genomes or large chunks of sequence data, such as large contigs, in an automated fashion. By comparing the genomes of several closely related species, conserved regulatory regions can be identified easily. For these reasons, making use of comparative genomic data is an important future challenge for the gene-prediction field. More functional genomics methods for finding genes are desperately needed to improve gene prediction. Only with sufficient mechanistic data can gene prediction be transformed from being statistical to being biological in nature. The field is working towards the ultimate dynamic model that can identify the consecutive exons of a gene, from its 5′ to its 3′ ends, as if they were being co-transcriptionally recognized and spliced.
引用
收藏
页码:698 / 709
页数:11
相关论文
共 50 条
  • [1] Computational prediction of eukaryotic protein-coding genes
    Zhang, MQ
    [J]. NATURE REVIEWS GENETICS, 2002, 3 (09) : 698 - 709
  • [2] Transcription of eukaryotic protein-coding genes
    Lee, TI
    Young, RA
    [J]. ANNUAL REVIEW OF GENETICS, 2000, 34 : 77 - 137
  • [3] PROMOTER SEQUENCES OF EUKARYOTIC PROTEIN-CODING GENES
    CHAMBON, P
    [J]. HOPPE-SEYLERS ZEITSCHRIFT FUR PHYSIOLOGISCHE CHEMIE, 1981, 362 (04): : 381 - 381
  • [4] PROMOTER SEQUENCES OF EUKARYOTIC PROTEIN-CODING GENES
    CORDEN, J
    WASYLYK, B
    BUCHWALDER, A
    CORSI, PS
    KEDINGER, C
    CHAMBON, P
    [J]. SCIENCE, 1980, 209 (4463) : 1406 - 1414
  • [5] Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
    Sarah Djebali
    Franck Delaplace
    Hugues Roest Crollius
    [J]. Genome Biology, 7
  • [6] Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA
    Djebali, Sarah
    Delaplace, Franck
    Roest Crollius, Hugues
    [J]. GENOME BIOLOGY, 2006, 7 (Suppl 1)
  • [7] Knowing when to stop: Transcription termination on protein-coding genes by eukaryotic RNAPII
    Rodriguez-Molina, Juan B.
    West, Steven
    Passmore, Lori A.
    [J]. MOLECULAR CELL, 2023, 83 (03) : 404 - 415
  • [8] Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?
    Neel Prabh
    Christian Rödelsperger
    [J]. BMC Bioinformatics, 17
  • [9] Are orphan genes protein-coding, prediction artifacts, or non-coding RNAs?
    Prabh, Neel
    Roedelsperger, Christian
    [J]. BMC BIOINFORMATICS, 2016, 17
  • [10] Computational Prediction of Protein-coding Regions in Human Transcriptomes: An Application to the Elderly
    Awe, Olaitan Igbagbo
    Makolo, Angela
    Fatumo, Segun
    [J]. 2017 INTERNATIONAL RURAL AND ELDERLY HEALTH INFORMATICS CONFERENCE (IREHI), 2017,