Statistical assessment of discriminative features for protein-coding and non coding cross-species conserved sequence elements

被引:2
|
作者
Creanza, Teresa M. [1 ,2 ]
Horner, David S. [2 ]
D'Addabbo, Annarita [1 ]
Maglietta, Rosalia [1 ]
Mignone, Flavio [3 ]
Ancona, Nicola [1 ]
Pesole, Graziano [4 ,5 ]
机构
[1] CNR, Ist Studi Sistemi Intelligenti Automaz, I-70126 Bari, Italy
[2] Univ Milan, Dipartimento Sci Biomol & Biotecnol, Milan, Italy
[3] Univ Milan, Dipartimento Chim Strutturale & Stereochim Inorga, Milan, Italy
[4] Univ Bari, Dipartmento Biochim & Biol Mol, Bari, Italy
[5] CNR, Ist Tecnol Biomed, I-70126 Bari, Italy
来源
BMC BIOINFORMATICS | 2009年 / 10卷
关键词
IDENTIFICATION; TOOL; REGIONS; SEARCH; MOUSE; BLAST; TAGS; RAT;
D O I
10.1186/1471-2105-10-S6-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: The identification of protein coding elements in sets of mammalian conserved elements is one of the major challenges in the current molecular biology research. Many features have been proposed for automatically distinguishing coding and non coding conserved sequences, making so necessary a systematic statistical assessment of their differences. A comprehensive study should be composed of an association study, i.e. a comparison of the distributions of the features in the two classes, and a prediction study in which the prediction accuracies of classifiers trained on single and groups of features are analyzed, conditionally to the compared species and to the sequence lengths. Results: In this paper we compared distributions of a set of comparative and non comparative features and evaluated the prediction accuracy of classifiers trained for discriminating sequence elements conserved among human, mouse and rat species. The association study showed that the analyzed features are statistically different in the two classes. In order to study the influence of the sequence lengths on the feature performances, a predictive study was performed on different data sets composed of coding and non coding alignments in equal number and equally long with an ascending average length. We found that the most discriminant feature was a comparative measure indicating the proportion of synonymous nucleotide substitutions per synonymous sites. Moreover, linear discriminant classifiers trained by using comparative features in general outperformed classifiers based on intrinsic ones. Finally, the prediction accuracy of classifiers trained on comparative features increased significantly by adding intrinsic features to the set of input variables, independently on sequence length (Kolmogorov-Smirnov P-value <= 0.05). Conclusion: We observed distinct and consistent patterns for individual and combined use of comparative and intrinsic classifiers, both with respect to different lengths of sequences/alignments and with respect to error rates in the classification of coding and non-coding elements. In particular, we noted that comparative features tend to be more accurate in the classification of coding sequences - this is likely related to the fact that such features capture deviations from strictly neutral evolution expected as a consequence of the characteristics of the genetic code.
引用
收藏
页数:12
相关论文
共 44 条
  • [21] Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome
    Bush, Stephen J.
    Muriuki, Charity
    McCulloch, Mary E. B.
    Farquhar, Iseabail L.
    Clark, Emily L.
    Hume, David A.
    GENETICS SELECTION EVOLUTION, 2018, 50
  • [23] A conserved element in the protein-coding sequence is required for normal expression of replication-dependent histone genes in developing Xenopus embryos
    Ficzycz, A
    Kaludov, NK
    Lele, Z
    Hurt, MM
    Ovsenek, N
    DEVELOPMENTAL BIOLOGY, 1997, 182 (01) : 21 - 32
  • [24] Human prion protein sequence elements impede cross-species chronic wasting disease transmission
    Kurt, Timothy D.
    Jiang, Lin
    Fernandez-Borges, Natalia
    Bett, Cyrus
    Liu, Jun
    Yang, Tom
    Spraker, Terry R.
    Castilla, Joaquin
    Eisenberg, David
    Kong, Qingzhong
    Sigurdson, Christina J.
    JOURNAL OF CLINICAL INVESTIGATION, 2015, 125 (04): : 1485 - 1496
  • [26] Highly conserved non-coding elements on either side of SOX9 associated with Pierre Robin sequence
    Sabina Benko
    Judy A Fantes
    Jeanne Amiel
    Dirk-Jan Kleinjan
    Sophie Thomas
    Jacqueline Ramsay
    Negar Jamshidi
    Abdelkader Essafi
    Simon Heaney
    Christopher T Gordon
    David McBride
    Christelle Golzio
    Malcolm Fisher
    Paul Perry
    Véronique Abadie
    Carmen Ayuso
    Muriel Holder-Espinasse
    Nicky Kilpatrick
    Melissa M Lees
    Arnaud Picard
    I Karen Temple
    Paul Thomas
    Marie-Paule Vazquez
    Michel Vekemans
    Hugues Roest Crollius
    Nicholas D Hastie
    Arnold Munnich
    Heather C Etchevers
    Anna Pelet
    Peter G Farlie
    David R FitzPatrick
    Stanislas Lyonnet
    Nature Genetics, 2009, 41 : 359 - 364
  • [27] Highly conserved non-coding elements on either side of SOX9 associated with Pierre Robin sequence
    Benko, Sabina
    Fantes, Judy A.
    Amiel, Jeanne
    Kleinjan, Dirk-Jan
    Thomas, Sophie
    Ramsay, Jacqueline
    Jamshidi, Negar
    Essafi, Abdelkader
    Heaney, Simon
    Gordon, Christopher T.
    McBride, David
    Golzio, Christelle
    Fisher, Malcolm
    Perry, Paul
    Abadie, Veronique
    Ayuso, Carmen
    Holder-Espinasse, Muriel
    Kilpatrick, Nicky
    Lees, Melissa M.
    Picard, Arnaud
    Temple, I. Karen
    Thomas, Paul
    Vazquez, Marie-Paule
    Vekemans, Michel
    Roest Crollius, Hugues
    Hastie, Nicholas D.
    Munnich, Arnold
    Etchevers, Heather C.
    Pelet, Anna
    Farlie, Peter G.
    FitzPatrick, David R.
    Lyonnet, Stanislas
    NATURE GENETICS, 2009, 41 (03) : 359 - 364
  • [28] THE COMPLETE SEQUENCE OF THE MOUSE SKELETAL ALPHA-ACTIN GENE REVEALS SEVERAL CONSERVED AND INVERTED REPEAT SEQUENCES OUTSIDE OF THE PROTEIN-CODING REGION
    HU, MCT
    SHARP, SB
    DAVIDSON, N
    MOLECULAR AND CELLULAR BIOLOGY, 1986, 6 (01) : 15 - 25
  • [29] Sequence evaluation of FGF and FGFR gene conserved non-coding elements in non-syndromic cleft lip and palate cases
    Riley, Bridget M.
    Murray, Jeffrey C.
    AMERICAN JOURNAL OF MEDICAL GENETICS PART A, 2007, 143A (24) : 3228 - 3234
  • [30] The transcriptional landscape of mouse beta cells compared to human beta cells reveals notable species differences in long non-coding RNA and protein-coding gene expression
    Christopher Benner
    Talitha van der Meulen
    Elena Cacéres
    Kristof Tigyi
    Cynthia J Donaldson
    Mark O Huising
    BMC Genomics, 15