Clustering DNA sequences by feature vectors

被引:24
|
作者
Liu, Libin
Ho, Yee-kin
Yau, Stephen
机构
[1] Univ Illinois, Dept Math Stat & Comp Sci, Chicago, IL 60607 USA
[2] Univ Illinois, Dept Biochem & Mol Genet, Chicago, IL 60607 USA
关键词
DNA sequences; genomic space; vector distance; global comparison of gene structures;
D O I
10.1016/j.ympev.2006.05.019
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
We represent all DNA sequences as points in twelve-dimensional space in such a way that homologous DNA sequences are clustered together, from which a new genomic space is created for global DNA sequences comparison of millions of genes simultaneously. More specifically, basing on the contents of four nucleotides, their distances from the origin and their distribution along the sequences, a twelve-dimensional vector is given to any DNA sequence. The applicability of this analysis on global comparison of gene structures was tested on myoglobin, beta-globin, histone-4, lysozyme, and rhodopsin families. Members from each family exhibit smaller vector distances relative to the distances of members from different families. The vector distance also distinguishes random sequences generated based on same bases composition. Sequence comparisons showed consistency with the BLAST method. Once the new gene is discovered, we can compute the location of this new gene in our genomic space. It is natural to predict that the properties of this new gene are similar to the properties of known genes that are locating near by. Biologists can do various experiments to test these properties. (c) 2006 Elsevier Inc. All rights reserved.
引用
收藏
页码:64 / 69
页数:6
相关论文
共 50 条
  • [1] Similarity Analysis of DNA Barcodes Sequences Based on Compressed Feature Vectors
    Yu, Hong-Jie
    [J]. BIO-INSPIRED COMPUTING AND APPLICATIONS, 2012, 6840 : 470 - 477
  • [2] Spectral feature vectors for graph clustering
    Luo, Bin
    Wilson, Richard C.
    Hancock, Edwin R.
    [J]. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2002, 2396 : 83 - 93
  • [3] DNA Sequences Vectors And Their Compaction
    Logofatu, Doina
    [J]. BICS 2008: PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON BIO-INSPIRED COMPUTATIONAL METHODS USED FOR SOLVING DIFFICULT PROBLEMS-DEVELOPMENT OF INTELLIGENT AND COMPLEX SYSTEMS, 2008, 1117 : 29 - 39
  • [4] On The Compaction Of DNA Sequences Vectors
    Logofatu, Doina
    [J]. ADVANCED BIO-INSPIRED COMPUTATIONAL METHODS, 2008, : 49 - 60
  • [5] DNA Sequences Vectors and Their Ordering
    Logofatu, Doina
    Gruber, Manfred
    [J]. BICS 2008: PROCEEDINGS OF THE 1ST INTERNATIONAL CONFERENCE ON BIO-INSPIRED COMPUTATIONAL METHODS USED FOR SOLVING DIFFICULT PROBLEMS-DEVELOPMENT OF INTELLIGENT AND COMPLEX SYSTEMS, 2008, 1117 : 3 - +
  • [6] Classifying DNA barcode multi-locus sequences with feature vectors and supervised approaches
    Weitschek, Emanuel
    Fiscon, Giulia
    Bertolazzi, Paola
    Felici, Giovanni
    [J]. GENOME, 2015, 58 (05) : 295 - 295
  • [7] Spectral Subspace Clustering for Graphs with Feature Vectors
    Guennemann, Stephan
    Faerber, Ines
    Raubach, Sebastian
    Seidl, Thomas
    [J]. 2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2013, : 231 - 240
  • [8] Clustering analysis of phonetic and text feature vectors
    Jicinsky, Milan
    Marek, Jaroslav
    [J]. 2017 IEEE 14TH INTERNATIONAL SCIENTIFIC CONFERENCE ON INFORMATICS, 2017, : 146 - 151
  • [9] Clustering feature vectors with mixed numerical and categorical attributes
    Brouwer, Roelof K.
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2008, 1 (04) : 285 - 298
  • [10] A method of clustering feature vectors via incremental iteration
    Huang, Rui
    Sang, Nong
    Liu, Le-Yuan
    Luo, Da-Peng
    Tang, Qi-Ling
    [J]. Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2010, 23 (03): : 320 - 326