Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison

被引:38
|
作者
Green, RE [1 ]
Brenner, SE
机构
[1] Univ Calif Berkeley, Dept Plant & Microbial Biol, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
基金
美国国家卫生研究院;
关键词
BLAST; bootstrap; FASTA; homology; normalization; sequence; sequence analysis; Structural Classification of Proteins (SCOP); substitution matrix;
D O I
10.1109/JPROC.2002.805303
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The exponentially growing library of known protein sequences represents molecules connected by an intricate network of evolutionary and functional relationships. To reveal these relationships, virtually every molecular biology experiment incorporates computational sequence analysis. The workhorse methods for this task make alignments between two sequences to measure their similarity. Informed use of these methods, such as NCBI BLAST [1], WU-BLAST [2], FASTA [3] and SSEARCH, requires understanding of their effectiveness. To permit informed sequence analysis, we have assessed the effectiveness of modern versions of these algorithms using the trusted relationships among ASTRAL [4] sequences in the Structural Classification of Proteins [5] database classification of protein structures [6]. We have reduced database representation artifacts through the use of a normalization method that addresses the uneven distribution of superfamily sizes. To allow for more meaningful and interpretable comparisons of results, we have implemented a bootstrapping procedure. We find that the most difficult pairwise relations to detect are those between members of larger superfamilies, and our test set is biased toward these. However, even when results are normalized, most distant evolutionary relationships elude detection.
引用
收藏
页码:1834 / 1847
页数:14
相关论文
共 50 条
  • [41] Bootstrapping a Text Normalization System for an Inflected Language. Numbers as a Test Case
    Nikulasdottir, Anna Bjork
    Guonason, Jon
    INTERSPEECH 2019, 2019, : 4455 - 4459
  • [42] Enhanced Bootstrapping Algorithm for Automatic Annotation of Tweets
    Mohd, Mudasir
    Jan, Rafiya
    Hakak, Nida
    INTERNATIONAL JOURNAL OF COGNITIVE INFORMATICS AND NATURAL INTELLIGENCE, 2020, 14 (02) : 35 - 60
  • [43] The Evidential Statistics of Genetic Assembly: Bootstrapping a Reference Sequence
    Toquenaga, Yukihiko
    Gagne, Takuya
    FRONTIERS IN ECOLOGY AND EVOLUTION, 2021, 9
  • [44] On the spectrum of pairwise comparison matrices
    Farkas, A
    György, A
    Rózsa, P
    LINEAR ALGEBRA AND ITS APPLICATIONS, 2004, 385 : 443 - 462
  • [45] Optimal Network Pairwise Comparison
    Jin, Jiashun
    Ke, Zheng Tracy
    Luo, Shengming
    Ma, Yucong
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024,
  • [46] Pairwise Comparison in Repeated Measures
    Oyeka, I. C. A.
    Nnanatu, C. C.
    JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2014, 13 (02) : 151 - 168
  • [47] ORDERING PAIRWISE COMPARISON STRUCTURES
    DELVER, R
    MONSUUR, H
    STORCKEN, AJA
    THEORY AND DECISION, 1991, 31 (01) : 75 - 94
  • [48] Ranking and selection for pairwise comparison
    Xiao, Hui
    Zhang, Yao
    Kou, Gang
    Zhang, Si
    Branke, Juergen
    NAVAL RESEARCH LOGISTICS, 2023, 70 (03) : 284 - 302
  • [49] PAIRWISE COMPARISON AND RANKING IN TOURNAMENTS
    BUHLMANN, H
    HUBER, PJ
    ANNALS OF MATHEMATICAL STATISTICS, 1963, 34 (02): : 501 - &
  • [50] Characterization of pairwise and multiple sequence alignment errors
    Landan, Giddy
    Graur, Dan
    GENE, 2009, 441 (1-2) : 141 - 147