Phylogenetic tree construction using trinucleotide usage profile (TUP)

被引:6
|
作者
Chen, Si [1 ,2 ]
Deng, Lih-Yuan [3 ]
Bowman, Dale [3 ]
Shiau, Jyh-Jen Horng [4 ]
Wong, Tit-Yee [5 ]
Madahian, Behrouz [3 ]
Lu, Henry Horng-Shing [4 ]
机构
[1] Wuhan Univ, Minist Educ, Key Lab Combinatorial Biosynth & Drug Discovery, Wuhan, Peoples R China
[2] Wuhan Univ, Sch Pharmaceut Sci, Wuhan, Peoples R China
[3] Univ Memphis, Dept Math Sci, Memphis, TN 38152 USA
[4] Natl Chiao Tung Univ, Inst Stat, Hsinchu, Taiwan
[5] Univ Memphis, Dept Biol Sci, Memphis, TN 38152 USA
来源
BMC BIOINFORMATICS | 2016年 / 17卷
关键词
Feature frequency profile (FFP); Reading frame; Summary statistics; Phylogenetic tree construction; Tree comparison; REQUIRING SEQUENCE ALIGNMENT; FEATURE FREQUENCY PROFILES; WHOLE-PROTEOME PHYLOGENY; DISSIMILARITY; BACTERIA; DISTANCE; COUNTS; GENUS; 16S;
D O I
10.1186/s12859-016-1222-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: It has been a challenging task to build a genome-wide phylogenetic tree for a large group of species containing a large number of genes with long nucleotides sequences. The most popular method, called feature frequency profile (FFP-k), finds the frequency distribution for all words of certain length k over the whole genome sequence using (overlapping) windows of the same length. For a satisfactory result, the recommended word length (k) ranges from 6 to 15 and it may not be a multiple of 3 (codon length). The total number of possible words needed for FFP-k can range from 4(6) = 4096 to 4(15). Results: We propose a simple improvement over the popular FFP method using only a typical word length of 3. A new method, called Trinucleotide Usage Profile (TUP), is proposed based only on the (relative) frequency distribution using non-overlapping windows of length 3. The total number of possible words needed for TUP is 43 = 64, which is much less than the total count for the recommended optimal " resolution" for FFP. To build a phylogenetic tree, we propose first representing each of the species by a TUP vector and then using an appropriate distance measure between pairs of the TUP vectors for the tree construction. In particular, we propose summarizing a DNA sequence by a matrix of three rows corresponding to three reading frames, recording the frequency distribution of the non-overlapping words of length 3 in each of the reading frame. We also provide a numerical measure for comparing trees constructed with various methods. Conclusions: Compared to the FFP method, our empirical study showed that the proposed TUP method is more capable of building phylogenetic trees with a stronger biological support. We further provide some justifications on this from the information theory viewpoint. Unlike the FFP method, the TUP method takes the advantage that the starting of the first reading frame is (usually) known. Without this information, the FFP method could only rely on the frequency distribution of overlapping words, which is the average (or mixture) of the frequency distributions of three possible reading frames. Consequently, we show (from the entropy viewpoint) that the FFP procedure could dilute important gene information and therefore provides less accurate classification.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Phylogenetic tree construction using trinucleotide usage profile (TUP)
    Si Chen
    Lih-Yuan Deng
    Dale Bowman
    Jyh-Jen Horng Shiau
    Tit-Yee Wong
    Behrouz Madahian
    Henry Horng-Shing Lu
    [J]. BMC Bioinformatics, 17
  • [2] Construction of Phylogenetic Tree using MEGA
    Kaur, Amandeep
    Sharma, Ajay Shiv
    Singh, Jasbir
    [J]. 2018 INTERNATIONAL CONFERENCE ON COMPUTING, POWER AND COMMUNICATION TECHNOLOGIES (GUCON), 2018, : 531 - 534
  • [3] CONSTRUCTION OF A PHYLOGENETIC TREE
    TOHA, J
    SOTO, MA
    PIEBER, M
    [J]. ZEITSCHRIFT FUR NATURFORSCHUNG C-A JOURNAL OF BIOSCIENCES, 1979, 34 (5-6): : 478 - 480
  • [4] Phylogenetic Tree Construction for DNA Sequences using Clustering Methods
    Mahapatro, Gayatri
    Mishra, Debahuti
    Shaw, Kailash
    Mishra, Sashikala
    Jena, Tanushree
    [J]. INTERNATIONAL CONFERENCE ON MODELLING OPTIMIZATION AND COMPUTING, 2012, 38 : 1362 - 1366
  • [5] Phylogenetic tree construction using Markov chain Monte Carlo
    Li, SY
    Pearl, DK
    Doss, H
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2000, 95 (450) : 493 - 508
  • [6] Fungal screening and phylogenetic tree construction
    Zhou, Jianyang
    [J]. 2020 INTERNATIONAL CONFERENCE ON ENERGY, ENVIRONMENT AND BIOENGINEERING (ICEEB 2020), 2020, 185
  • [7] CONSTRUCTION OF A PHYLOGENETIC TREE .2.
    TOHA, J
    SOTO, MA
    PIEBER, M
    [J]. ZEITSCHRIFT FUR NATURFORSCHUNG C-A JOURNAL OF BIOSCIENCES, 1979, 34 (12): : 1269 - 1271
  • [8] A DISCRIMINATION MEASURE FOR PHYLOGENETIC TREE CONSTRUCTION
    Feng, Jie
    Wang, Tian-Ming
    [J]. INTERNATIONAL JOURNAL OF BIOMATHEMATICS, 2008, 1 (03) : 401 - 408
  • [9] An evolutionary approach to phylogenetic tree construction
    Congdon, CB
    [J]. PROCEEDINGS OF THE 6TH JOINT CONFERENCE ON INFORMATION SCIENCES, 2002, : 651 - 654
  • [10] Phylogenetic tree construction using sequential stochastic approximation Monte Carlo
    Cheon, Sooyoung
    Liang, Faming
    [J]. BIOSYSTEMS, 2008, 91 (01) : 94 - 107