SEQUENCE ORDINATIONS - A MULTIVARIATE-ANALYSIS APPROACH TO ANALYZING LARGE SEQUENCE DATA SETS

被引：0

作者：

HIGGINS, DG

机构：

来源：

COMPUTER APPLICATIONS IN THE BIOSCIENCES | 1992年 / 8卷 / 01期

关键词：

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Ordination is a powerful method for analysing complex data sets but has been largely ignored in sequence analysis. This paper shows how to use principal coordinates analysis to find low-dimensional representations of distance matrices derived from aligned sets of sequences. The method takes a matrix of Euclidean distances between all pairs of sequence and finds a coordinate space where the distances are exactly preserved. The main problem is to find a measure of distance between aligned sequences that is Euclidean. The simplest distance function is the square root of the percentage difference (as measured by identities) between two sequences, where one ignores any positions in the alignment where there is a gap in any sequence. If one does not ignore positions with a gap, the distances cannot be guaranteed to be Euclidean but the deleterious effects are trivial. Two examples of using the method are shown. A set of 226 aligned globins were analysed and the resulting ordination very successfully represents the known patterns of relationship between the sequences. In the other example, a set of 610 aligned 5S rRNA sequences were analysed. Sequence ordinations complement phylogenetic analyses. They should not be viewed as a complete alternative.

引用

页码：15 / 22

页数：8

共 50 条

[1] MANAGEMENT AND MULTIVARIATE-ANALYSIS OF LARGE DATA SETS IN VEGETATION RESEARCH
WILDI, O
[J]. VEGETATIO, 1980, 42 (1-3): : 175 - 180
[2] APPLICATION OF MULTIVARIATE-ANALYSIS TO ENVIRONMENTAL DATA SETS
HOPKE, PK
[J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 1975, 170 (AUG24): : 10 - 10
[3] Outlying Sequence Detection in Large Data Sets
Tajer, Ali
Veeravalli, Venugopal V.
Poor, H. Vincent
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2014, 31 (05) : 44 - 56
[4] DnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets
Rozas, Julio
Ferrer-Mata, Albert
Carlos Sanchez-DelBarrio, Juan
Guirao-Rico, Sara
Librado, Pablo
Ramos-Onsins, Sebastian E.
Sanchez-Gracia, Alejandro
[J]. MOLECULAR BIOLOGY AND EVOLUTION, 2017, 34 (12) : 3299 - 3302
[5] ANALYSIS OF MULTIVARIATE DATA - MULTIVARIATE-ANALYSIS OF REGRESSION
MAGER, PP
MAGER, H
[J]. BIOMETRISCHE ZEITSCHRIFT, 1975, 17 (05): : 325 - 328
[6] An effective approach for analyzing "prefinished" genomic sequence data
Kuehl, PM
Weisemann, JM
Touchman, JW
Green, ED
Boguski, MS
[J]. GENOME RESEARCH, 1999, 9 (02) : 189 - 194
[7] A TARGETED APPROACH FOR ANALYZING LARGE LIPIDOMIC DATA SETS
Paulson, D.
Mazzer, P.
[J]. PROCEEDINGS OF THE SOUTH DAKOTA ACADEMY OF SCIENCE, VOL 96, 2017, 96 : 223 - 223
[8] THE COMPLEMENTARY USE OF CHAID AND MNA (MULTIVARIATE NOMINAL SCALE ANALYSIS) IN ANALYZING LARGE DATA SETS
SHAW, T
STUMPF, RH
[J]. SOUTH AFRICAN STATISTICAL JOURNAL, 1984, 18 (02) : 198 - 198
[9] MULTIVARIATE-ANALYSIS OF QUALITATIVE DATA
TAYLOR, KW
CHAPPELL, NL
[J]. CANADIAN REVIEW OF SOCIOLOGY AND ANTHROPOLOGY-REVUE CANADIENNE DE SOCIOLOGIE ET D ANTHROPOLOGIE, 1980, 17 (02): : 93 - 108
[10] MULTIVARIATE-ANALYSIS OF PECVD DATA
DOSE, V
[J]. APPLIED PHYSICS A-MATERIALS SCIENCE & PROCESSING, 1993, 56 (06): : 471 - 477

← 1 2 3 4 5 →