A Conditional Autoregressive Model for Detecting Natural Selection in Protein-Coding DNA Sequences

被引:0
|
作者
Fan, Yu [1 ]
Wu, Rui [2 ]
Chen, Ming-Hui [2 ]
Kuo, Lynn [2 ]
Lewis, Paul O. [3 ]
机构
[1] Univ Texas MD Anderson Canc Ctr, Dept Bioinformat & Computat Biol, 1400 Pressler Dr,FCT4-6000, Houston, TX 77030 USA
[2] Univ Connecticut, Dept Stat, Storrs, CT 06269 USA
[3] Univ Connecticut, Dept Ecol Evolut Biol, Storrs, CT 06269 USA
来源
关键词
EVOLUTIONARY INFERENCE; MOLECULAR ADAPTATION; TERTIARY STRUCTURE; SITES;
D O I
10.1007/978-1-4614-7846-1_17
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Phylogenetics, the study of evolutionary relationships among groups of organisms, has played an important role in modern biological research, such as genomic comparison, detecting orthology and paralogy, estimating divergence times, reconstructing ancient proteins, identifying mutations likely to be associated with disease, determining the identity of new pathogens, and finding the residues that are important to natural selection. Given an alignment of protein-coding DNA sequences, most methods for detecting natural selection rely on estimating the codon-specific nonsynonymous/synonymous rate ratios (dN/dS). Here, we describe an approach to modeling variation in the dN/dS by using a conditional autoregressive (CAR) model. The CAR model relaxes the assumption in most contemporary phylogenetic models, i.e., sites in molecular sequences evolve independently. By incorporating the information stored in the Protein Data Bank (PDB) file, the CAR model estimates the dN/dS based on the protein three-dimensional structure. We implement the model in a fully Bayesian approach with all parameters of the model considered as random variables and make use of the NVIDIA's parallel computing architecture (CUDA) to accelerate the calculation. Our result of analyzing an empirical abalone sperm lysine data is in accordance with the previous findings.
引用
收藏
页码:203 / 212
页数:10
相关论文
共 50 条
  • [21] CDSbank: taxonomy-aware extraction, selection, renaming and formatting of protein-coding DNA or amino acid sequences
    Bart Hazes
    [J]. BMC Bioinformatics, 15
  • [22] PROMOTER SEQUENCES OF EUKARYOTIC PROTEIN-CODING GENES
    CHAMBON, P
    [J]. HOPPE-SEYLERS ZEITSCHRIFT FUR PHYSIOLOGISCHE CHEMIE, 1981, 362 (04): : 381 - 381
  • [23] Overlapping codes within protein-coding sequences
    Itzkovitz, Shalev
    Hodis, Eran
    Segal, Eran
    [J]. GENOME RESEARCH, 2010, 20 (11) : 1582 - 1589
  • [24] Identifying protein-coding genes in genomic sequences
    Jennifer Harrow
    Alinda Nagy
    Alexandre Reymond
    Tyler Alioto
    Laszlo Patthy
    Stylianos E Antonarakis
    Roderic Guigó
    [J]. Genome Biology, 10
  • [25] PROMOTER SEQUENCES OF EUKARYOTIC PROTEIN-CODING GENES
    CORDEN, J
    WASYLYK, B
    BUCHWALDER, A
    CORSI, PS
    KEDINGER, C
    CHAMBON, P
    [J]. SCIENCE, 1980, 209 (4463) : 1406 - 1414
  • [26] Identifying protein-coding genes in genomic sequences
    Harrow, Jennifer
    Nagy, Alinda
    Reymond, Alexandre
    Alioto, Tyler
    Patthy, Laszlo
    Antonarakis, Stylianos E.
    Guigo, Roderic
    [J]. GENOME BIOLOGY, 2009, 10 (01): : 201
  • [27] A three-state model for DNA protein-coding regions
    Pinho, Armando J.
    Neves, Antnio J. R.
    Afreixo, Vera
    Bastos, Carlos A. C.
    Ferreira, Paulo J. S. G.
    [J]. IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, 2006, 53 (11) : 2148 - 2155
  • [28] An algorithm for detecting directional and non-directional positive selection, neutrality and negative selection in protein coding DNA sequences
    Creevey, CJ
    McInerney, JO
    [J]. GENE, 2002, 300 (1-2) : 43 - 51
  • [29] Protein-coding regions prediction combining similarity searches and conservative evolutionary properties of protein-coding sequences
    Rogozin, IB
    D'Angelo, D
    Milanesi, L
    [J]. GENE, 1999, 226 (01) : 129 - 137
  • [30] A Coding Theoretic Model for Error-detecting in DNA Sequences
    Debata, Prajna Paramita
    Mishra, Debahuti
    Shaw, Kailash
    Mishra, Sashikala
    [J]. INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING, 2012, 38 : 1773 - 1777