Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison

被引:38
|
作者
Green, RE [1 ]
Brenner, SE
机构
[1] Univ Calif Berkeley, Dept Plant & Microbial Biol, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
基金
美国国家卫生研究院;
关键词
BLAST; bootstrap; FASTA; homology; normalization; sequence; sequence analysis; Structural Classification of Proteins (SCOP); substitution matrix;
D O I
10.1109/JPROC.2002.805303
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The exponentially growing library of known protein sequences represents molecules connected by an intricate network of evolutionary and functional relationships. To reveal these relationships, virtually every molecular biology experiment incorporates computational sequence analysis. The workhorse methods for this task make alignments between two sequences to measure their similarity. Informed use of these methods, such as NCBI BLAST [1], WU-BLAST [2], FASTA [3] and SSEARCH, requires understanding of their effectiveness. To permit informed sequence analysis, we have assessed the effectiveness of modern versions of these algorithms using the trusted relationships among ASTRAL [4] sequences in the Structural Classification of Proteins [5] database classification of protein structures [6]. We have reduced database representation artifacts through the use of a normalization method that addresses the uneven distribution of superfamily sizes. To allow for more meaningful and interpretable comparisons of results, we have implemented a bootstrapping procedure. We find that the most difficult pairwise relations to detect are those between members of larger superfamilies, and our test set is biased toward these. However, even when results are normalized, most distant evolutionary relationships elude detection.
引用
收藏
页码:1834 / 1847
页数:14
相关论文
共 50 条
  • [31] Pairwise normalization: A neuroeconomic theory of multi-attribute choice
    Landry, Peter
    Webb, Ryan
    JOURNAL OF ECONOMIC THEORY, 2021, 193
  • [32] Pareto Optimal Pairwise Sequence Alignment
    DeRonne, Kevin W.
    Karypis, George
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (02) : 481 - 493
  • [33] Handling updates of a pairwise sequence alignment
    Hong, Changjin
    Tewfik, Ahmed H.
    2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 2352 - 2355
  • [34] MULTIPLE SEQUENCE ALIGNMENT BY A PAIRWISE ALGORITHM
    TAYLOR, WR
    COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1987, 3 (02): : 81 - 87
  • [35] Pairwise Sequence Alignment with Gaps with GPU
    Carroll, Thomas C.
    Ojiaku, Jude-Thaddeus
    Wong, Prudence W. H.
    2015 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING - CLUSTER 2015, 2015, : 603 - 610
  • [36] Rigid Region Pairwise Sequence Alignment
    Zivanic, Marko
    Daescu, Ovidiu
    Kurdia, Anastasia
    2011 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS, 2011, : 319 - 326
  • [37] Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap (vol 21, 3824, 2005)
    Price, GA
    Crooks, GE
    Green, RE
    Brenner, SE
    BIOINFORMATICS, 2005, 21 (23) : 4318 - 4318
  • [38] Understanding the Genetic Diversity of Picobirnavirus: A Classification Update Based on Phylogenetic and Pairwise Sequence Comparison Approaches
    Perez, Lester J.
    Cloherty, Gavin A.
    Berg, Michael G.
    VIRUSES-BASEL, 2021, 13 (08):
  • [39] Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification
    Bao, Yiming
    Chetvernin, Vyacheslav
    Tatusova, Tatiana
    ARCHIVES OF VIROLOGY, 2014, 159 (12) : 3293 - 3304
  • [40] Improvements to pairwise sequence comparison (PASC): a genome-based web tool for virus classification
    Yiming Bao
    Vyacheslav Chetvernin
    Tatiana Tatusova
    Archives of Virology, 2014, 159 : 3293 - 3304