Analysis of canonical and non-canonical splice sites in mammalian genomes

被引:449
|
作者
Burset, M [1 ]
Seledtsov, IA [1 ]
Solovyev, VV [1 ]
机构
[1] Sanger Ctr, Informat Div, Cambridge CB10 1SA, England
基金
英国惠康基金;
关键词
D O I
10.1093/nar/28.21.4364
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides: GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice-site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-BG;canonical pairs, six AT-AC pairs (of which two were-errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole: set of annotated mammalian non-canonical splice-sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus similar to 600) and finally, a set of 290 EST-supported non-canonical splice sites, Both sets should be significant for future investigations of the splicing mechanism.
引用
收藏
页码:4364 / 4375
页数:12
相关论文
共 50 条
  • [1] SpliceDB: database of canonical and non-canonical mammalian splice sites
    Burset, M
    Seledtsov, IA
    Solovyev, VV
    NUCLEIC ACIDS RESEARCH, 2001, 29 (01) : 255 - 259
  • [2] A comprehensive survey of non-canonical splice sites in the human transcriptome
    Parada, Guillermo E.
    Munita, Roberto
    Cerda, Cledi A.
    Gysling, Katia
    NUCLEIC ACIDS RESEARCH, 2014, 42 (16) : 10564 - 10578
  • [3] Beyond the canonical splice site: retrospective study of non-canonical splice site variants and their pathogenicity
    Lo, Chiao Ling
    Smart, Trevor
    Wang, Jocelyn
    Gall, Bryan
    Meyers, Bridgette
    Choo, Ezen
    Sanapareddy, Nina
    Keen-Kim, Dianne
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2023, 31 : 266 - 267
  • [4] Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes
    Boas Pucker
    Samuel F. Brockington
    BMC Genomics, 19
  • [5] Genome-wide analyses supported by RNA-Seq reveal non-canonical splice sites in plant genomes
    Pucker, Boas
    Brockington, Samuel F.
    BMC GENOMICS, 2018, 19
  • [6] Analysis of REST binding sites with canonical and non-canonical motifs in human cell lines
    Choi, Jaejoon
    Lee, Eunjung Alice
    BMC MEDICAL GENOMICS, 2024, 17 (SUPPL 1)
  • [7] Non-canonical agreement is canonical
    Polinsky, M
    TRANSACTIONS OF THE PHILOLOGICAL SOCIETY, 2003, 101 (02) : 279 - 312
  • [8] Non-canonical poly(A) polymerase in mammalian gametogenesis
    Kashiwabara, Shin-Ichi
    Nakamshi, Tomoko
    Kimura, Masanori
    Baba, Tadashi
    BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS, 2008, 1779 (04): : 230 - 238
  • [9] Animal, Fungi, and Plant Genome Sequences Harbor Different Non-Canonical Splice Sites
    Frey, Katharina
    Pucker, Boas
    CELLS, 2020, 9 (02)
  • [10] Circular RNAs: Non-Canonical Observations on Non-Canonical RNAs
    Stringer, Brett W.
    Gantley, Laura
    Conn, Simon J.
    CELLS, 2023, 12 (02)