共 4 条
Extended many-item similarity indices for sets of nucleotide and protein sequences
被引:11
|作者:
Bajusz, David
[1
]
Miranda-Quintana, Ramon Alain
[2
,3
]
Racz, Anita
[4
]
Heberger, Karoly
[4
]
机构:
[1] Res Ctr Nat Sci, Med Chem Res Grp, Magyar Tudosok Krt 2, H-1117 Budapest, Hungary
[2] Univ Florida, Dept Chem, Gainesville, FL 32611 USA
[3] Univ Florida, Quantum Theory Project, Gainesville, FL 32611 USA
[4] Res Ctr Nat Sci, Plasma Chem Res Grp, Magyar Tudosok Krt 2, H-1117 Budapest, Hungary
来源:
关键词:
Multiple comparisons;
DNA sequences;
Protein sequences;
Diversity analysis;
Similarity indices;
Consistency;
ANOVA;
Human protein kinases;
Human SH2 domains;
Cytochrome P450;
CELL-FORMATION;
SEARCH;
COEFFICIENTS;
COMPLEMENT;
SELECTION;
GENE;
D O I:
10.1016/j.csbj.2021.06.021
中图分类号:
Q5 [生物化学];
Q7 [分子生物学];
学科分类号:
071010 ;
081704 ;
摘要:
Quantification of similarities between protein sequences or DNA/RNA strands is a (sub-)task that is ubiquitously present in bioinformatics workflows, and is usually accomplished by pairwise comparisons of sequences, utilizing simple (e.g. percent identity) or more intricate concepts (e.g. substitution scoring matrices). Complex tasks (such as clustering) rely on a large number of pairwise comparisons under the hood, instead of a direct quantification of set similarities. Based on our recently introduced framework that enables multiple comparisons of binary molecular fingerprints (i.e., direct calculation of the similarity of fingerprint sets), here we introduce novel symmetric similarity indices for analogous calculations on sets of character sequences with more than two (t) possible items (e.g. DNA/RNA sequences with t = 4, or protein sequences with t = 20). The features of these new indices are studied in detail with analysis of variance (ANOVA), and demonstrated with three case studies of protein/DNA sequences with varying degrees of similarity (or evolutionary proximity). The Python code for the extended many-item similarity indices is publicly available at: https://github.com/ramirandaq/tn_Comparisons. (C) 2021 Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
引用
收藏
页码:3628 / 3639
页数:12
相关论文