Analysis and comparision of information theory-based distances for genomic strings

被引:0
|
作者
Balzano, Walter [1 ]
Cicalese, Ferdinando [2 ]
Del Sorbo, Maria Rosaria [3 ]
Vaccaro, Ugo [4 ]
机构
[1] Univ Naples Federico II, Dipartimento Sci Fis, Complesso Univ Monte St Angelo,Via Cintia, I-80126 Naples, Italy
[2] Univ Bielefeld, Tech Fakultaet, AG Genominformat, Bielefeld, Germany
[3] Univ Naples Federico II, Dipartimento Matemat & Applicaz, I-80126 Naples, Italy
[4] Univ Salerno, Dipartimento Informat & Applicaz, I-84084 Fisciano, Italy
关键词
alignment-free genomic string distance; information; entropy;
D O I
暂无
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genomic string comparison via alignment are widely applied for mining and retrieval of information in biological databases. In some situation, the effectiveness of such alignment based comparison is still unclear, e.g., for sequences with non-uniform length and with significant shuffling of identical substrings. An alternative approach is the one based on information theory distances. Biological data information content is stored in very long strings of only four characters. In last ten years, several entropic measures have been proposed for genomic string analysis. Notwithstanding their individual merit and experimental validation, to the nest of our knowledge, there is no direct comparison of these different metrics. We shall present four of the most representative alignment-free distance measures, based on mutual information. Each one has a different origin and expression. Our comparison involves a sort of arrangement, to reduce different concepts to a unique formalism, so as it has been possible to construct a phylogenetic tree for each of them. The trees produced via these metrics are compared to the ones widely accepted as biologically validated. In general the results provided more evidence of the reliability of the alignment-free distance models. Also, we observe that one of the metrics appeared to be more robust than the other three. We believe that this result can be object of further researches and observations. Many of the results of experimentation, the graphics and the table are available at the following URL: http://people.na.infn.it/similar to wbalzano/BIO.
引用
收藏
页码:292 / +
页数:3
相关论文
共 50 条
  • [1] Distributions of distances in information strings
    Kunz, M
    Radl, Z
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 1998, 38 (03): : 374 - 378
  • [2] INFORMATION THEORY-BASED ANALYSIS OF CLASSICAL HLA GENES
    Huang, Hu
    Wang, Wei
    Bolon, Yung-Tsi
    Malmberg, Craig
    Kennedy, Caleb
    Maiers, Martin
    [J]. HUMAN IMMUNOLOGY, 2016, 77 : 85 - 85
  • [3] An information theory-based analysis of interaction potentials between neurofilaments
    Kumar, S
    Yin, XH
    Trapp, BD
    Hoh, JH
    Paulaitis, ME
    [J]. BIOPHYSICAL JOURNAL, 2001, 80 (01) : 414A - 414A
  • [4] IMMAN: free software for information theory-based chemometric analysis
    Ricardo W. Pino Urias
    Stephen J. Barigye
    Yovani Marrero-Ponce
    César R. García-Jacas
    José R. Valdes-Martiní
    Facundo Perez-Gimenez
    [J]. Molecular Diversity, 2015, 19 : 305 - 319
  • [5] IMMAN: free software for information theory-based chemometric analysis
    Pino Urias, Ricardo W.
    Barigye, Stephen J.
    Marrero-Ponce, Yovani
    Garcia-Jacas, Cesar R.
    Valdes-Martini, Jose R.
    Perez-Gimenez, Facundo
    [J]. MOLECULAR DIVERSITY, 2015, 19 (02) : 305 - 319
  • [6] Information theory-based algorithm for in silico prediction of PCR products with whole genomic sequences as templates
    Cao, YF
    Wang, LJ
    Xu, KX
    Kou, CH
    Zhang, YL
    Wei, GF
    He, JJ
    Wang, YF
    Zhao, LP
    [J]. BMC BIOINFORMATICS, 2005, 6 (1)
  • [7] Information theory-based algorithm for in silico prediction of PCR products with whole genomic sequences as templates
    Youfang Cao
    Lianjie Wang
    Kexue Xu
    Chunhai Kou
    Yulei Zhang
    Guifang Wei
    Junjian He
    Yunfang Wang
    Liping Zhao
    [J]. BMC Bioinformatics, 6
  • [8] Algorithmic Information Theory-Based Analysis of Earth Observation Images: An Assessment
    Cerra, Daniele
    Mallet, Alexandre
    Gueguen, Lionel
    Datcu, Mihai
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2010, 7 (01) : 8 - 12
  • [9] Information theory-based analysis of partial and total occlusion in object tracking
    Loutas, E
    Nikou, C
    Pitas, I
    [J]. 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2002, : 309 - 312
  • [10] Information theory-based software metrics and obfuscation
    Kirk, SR
    Jenkins, S
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2004, 72 (02) : 179 - 186