Bootstrapping and normalization for enhanced evaluations of pairwise sequence comparison

被引:38
|
作者
Green, RE [1 ]
Brenner, SE
机构
[1] Univ Calif Berkeley, Dept Plant & Microbial Biol, Berkeley, CA 94720 USA
[2] Univ Calif Berkeley, Dept Mol & Cell Biol, Berkeley, CA 94720 USA
基金
美国国家卫生研究院;
关键词
BLAST; bootstrap; FASTA; homology; normalization; sequence; sequence analysis; Structural Classification of Proteins (SCOP); substitution matrix;
D O I
10.1109/JPROC.2002.805303
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The exponentially growing library of known protein sequences represents molecules connected by an intricate network of evolutionary and functional relationships. To reveal these relationships, virtually every molecular biology experiment incorporates computational sequence analysis. The workhorse methods for this task make alignments between two sequences to measure their similarity. Informed use of these methods, such as NCBI BLAST [1], WU-BLAST [2], FASTA [3] and SSEARCH, requires understanding of their effectiveness. To permit informed sequence analysis, we have assessed the effectiveness of modern versions of these algorithms using the trusted relationships among ASTRAL [4] sequences in the Structural Classification of Proteins [5] database classification of protein structures [6]. We have reduced database representation artifacts through the use of a normalization method that addresses the uneven distribution of superfamily sizes. To allow for more meaningful and interpretable comparisons of results, we have implemented a bootstrapping procedure. We find that the most difficult pairwise relations to detect are those between members of larger superfamilies, and our test set is biased toward these. However, even when results are normalized, most distant evolutionary relationships elude detection.
引用
收藏
页码:1834 / 1847
页数:14
相关论文
共 50 条
  • [21] Parametric bootstrapping for biological sequence motifs
    Patrick K. O’Neill
    Ivan Erill
    BMC Bioinformatics, 17
  • [22] tuple_plot: Fast pairwise nucleotide sequence comparison with noise suppression
    Szafranski, Karol
    Jahn, Niels
    Platzer, Matthias
    BIOINFORMATICS, 2006, 22 (15) : 1917 - 1918
  • [23] iPAK: An in situ pairwise key bootstrapping scheme for wireless sensor networks
    Ma, Liran
    Cheng, Xiuzhen
    Liu, Fang
    An, Fengguang
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2007, 18 (08) : 1174 - 1184
  • [24] Pairwise Normalization in SimRank Variants: Problem, Solution, and Evaluation
    Hamedani, Masoud Reyhani
    Kim, Sang-Wook
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 534 - 541
  • [25] DNorm: disease name normalization with pairwise learning to rank
    Leaman, Robert
    Dogan, Rezarta Islamaj
    Lu, Zhiyong
    BIOINFORMATICS, 2013, 29 (22) : 2909 - 2917
  • [26] Extended Pairwise Sequence Alignment
    Araujo, Eloi
    Martinez, Fabio V.
    Rozante, Luiz C.
    Almeida, Nalvo F.
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS, ICCSA 2023, PT I, 2023, 13956 : 218 - 230
  • [27] Using Markov model to improve word normalization algorithm for biological sequence comparison
    Dai, Qi
    Liu, Xiaoqing
    Yao, Yuhua
    Zhao, Fukun
    AMINO ACIDS, 2012, 42 (05) : 1867 - 1877
  • [28] Using Markov model to improve word normalization algorithm for biological sequence comparison
    Qi Dai
    Xiaoqing Liu
    Yuhua Yao
    Fukun Zhao
    Amino Acids, 2012, 42 : 1867 - 1877
  • [29] PAIRWISE COMPARISON AND RANKING
    BUHLMANN, H
    HUBER, P
    ANNALS OF MATHEMATICAL STATISTICS, 1962, 33 (02): : 825 - &
  • [30] JacSim*: An Effective and Efficient Solution to the Pairwise Normalization Problem in SimRank
    Hamedani, Masoud Reyhani
    Kim, Sang-Wook
    IEEE ACCESS, 2021, 9 : 146038 - 146049