Clustering DNA sequences by feature vectors

被引:24
|
作者
Liu, Libin
Ho, Yee-kin
Yau, Stephen
机构
[1] Univ Illinois, Dept Math Stat & Comp Sci, Chicago, IL 60607 USA
[2] Univ Illinois, Dept Biochem & Mol Genet, Chicago, IL 60607 USA
关键词
DNA sequences; genomic space; vector distance; global comparison of gene structures;
D O I
10.1016/j.ympev.2006.05.019
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We represent all DNA sequences as points in twelve-dimensional space in such a way that homologous DNA sequences are clustered together, from which a new genomic space is created for global DNA sequences comparison of millions of genes simultaneously. More specifically, basing on the contents of four nucleotides, their distances from the origin and their distribution along the sequences, a twelve-dimensional vector is given to any DNA sequence. The applicability of this analysis on global comparison of gene structures was tested on myoglobin, beta-globin, histone-4, lysozyme, and rhodopsin families. Members from each family exhibit smaller vector distances relative to the distances of members from different families. The vector distance also distinguishes random sequences generated based on same bases composition. Sequence comparisons showed consistency with the BLAST method. Once the new gene is discovered, we can compute the location of this new gene in our genomic space. It is natural to predict that the properties of this new gene are similar to the properties of known genes that are locating near by. Biologists can do various experiments to test these properties. (c) 2006 Elsevier Inc. All rights reserved.
引用
下载
收藏
页码:64 / 69
页数:6
相关论文
共 50 条
  • [31] Clustering-Based Compression for Population DNA Sequences
    Cheng, Kin-On
    Law, Ngai-Fong
    Siu, Wan-Chi
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (01) : 208 - 221
  • [32] DeLUCS: Deep learning for unsupervised clustering of DNA sequences
    Arias, Pablo Milla
    Alipour, Fatemeh
    Hill, Kathleen A.
    Kari, Lila
    PLOS ONE, 2022, 17 (01):
  • [33] Clustering of identical oligomers in coding and noncoding DNA sequences
    Stanley, RHR
    Dokholyan, NV
    Buldyrev, SV
    Havlin, S
    Stanley, HE
    JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 1999, 17 (01): : 79 - 87
  • [34] RJMCMC Learning for Clustering and Feature Selection of L2-Normalized Vectors
    Amayri, Ola
    Bouguila, Nizar
    2016 INTERNATIONAL CONFERENCE ON CONTROL, DECISION AND INFORMATION TECHNOLOGIES (CODIT), 2016, : 269 - 274
  • [35] Effects of Clustering Feature Vectors on Bus Travel Time Prediction: A Case Study
    Shaji, Hima Elsa
    Vanajakshi, Lelitha
    Tangirala, Arun K.
    2021 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2021, : 741 - 746
  • [36] Bayesian Clustering of Fuzzy Feature Vectors Using a Quasi-Likelihood Approach
    Marttinen, Pekka
    Tang, Jing
    De Baets, Bernard
    Dawyndt, Peter
    Corander, Jukka
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, 31 (01) : 74 - 85
  • [37] A method of relational fuzzy clustering based on producing feature vectors using FastMap
    Brouwer, Roelof Kars
    INFORMATION SCIENCES, 2009, 179 (20) : 3561 - 3582
  • [38] Feature extraction from DNA sequences by multifractal analysis
    Zhang, H
    Kinsner, W
    PROCEEDINGS OF THE 23RD ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-4: BUILDING NEW BRIDGES AT THE FRONTIERS OF ENGINEERING AND MEDICINE, 2001, 23 : 1567 - 1572
  • [39] Feature extraction from DNA sequences by fractal analysis
    Kinsner, W
    Zhang, H
    IEEE-EMBS ASIA PACIFIC CONFERENCE ON BIOMEDICAL ENGINEERING - PROCEEDINGS, PTS 1 & 2, 2000, : 147 - 148
  • [40] Computational discovery of feature patterns in nucleosomal DNA sequences
    Zheng, Yiyu
    Li, Xiaoman
    Hu, Haiyan
    GENOMICS, 2014, 104 (02) : 87 - 95