Searching DNA databases for similarities to DNA sequences: when is a match significant?

被引:73
|
作者
Anderson, I [1 ]
Brass, A [1 ]
机构
[1] Univ Manchester, Sch Biol Sci, Manchester M13 9PT, Lancs, England
关键词
D O I
10.1093/bioinformatics/14.4.349
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Searching DNA sequences against a DNA database is an essential element of sequence analysis. However, few systematic studies have been carried out to deter-mine when a match between two DNA sequences has biological significance and this is limiting the use that can be made of DNA searching algorithms. Results: A rest set of DNA sequences has been constructed consisting of artificially evolved and real sequences. This set has been used to test various database searching algorithms (BLAST, BLAST2, FASTA and Smith-Waterman) on a subset of the EMBL database. The results of this analysis have been used to determine the sensitivity and coverage of all of the algorithms. Guidelines have been produced which can be used to assess the significance of DNA database search results. The Smith-Water-man algorithm was shown to have the best coverage, but the wet-st sensitivity, whereas the default BLASTN algorithm (word length set to 11) was shown to have good sensitivity, but poor coverage. A sensible compromise between speed, sensitivity and coverage can be obtained using either the FASTA or BLAST (word length set to 6) algorithms. However; analysis of the results also showed that no algorithm works well when the length of the probe sequence is <200 bases. In general, matches can accurately be identified between coding regions of DNA sequences when there is >35% sequence identity between the corresponding proteins. Searching a DNA sequence against a DNA sequence database can, therefore, be a useful tool in sequence analysis.
引用
收藏
页码:349 / 356
页数:8
相关论文
共 50 条
  • [21] Searching national DNA databases with complex DNA profiles: An empirical study using probabilistic genotyping
    Nozownik, Severine
    Hicks, Tacha
    Basset, Patrick
    Castella, Vincent
    FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2025, 76
  • [22] Validation of SmartRank: A likelihood ratio software for searching national DNA databases with complex DNA profiles
    Benschop, Corina C. G.
    van de Merwe, Linda
    de Jong, Jeroen
    Vanvooren, Vanessa
    Kempenaers, Morgane
    van der Beek, C. P.
    Barni, Filippo
    Lopez Reyes, Eusebio
    Moulin, Lea
    Pene, Laurent
    Haned, Hinda
    Sijen, Titia
    FORENSIC SCIENCE INTERNATIONAL-GENETICS, 2017, 29 : 145 - 153
  • [23] CONSTRUCTION OF DNA DATABASES AND PROBLEMS ASSOCIATED WITH THE DETERMINATION OF BAND MATCH PROBABILITIES
    GILL, P
    WERRETT, DJ
    EVETT, IW
    SULLIVAN, K
    GENETICAL RESEARCH, 1990, 55 (02) : 119 - 120
  • [24] DNA databases: When fear goes too far
    Peterson, RS
    AMERICAN CRIMINAL LAW REVIEW, 2000, 37 (03) : 1219 - 1238
  • [25] Analysis of Similarities/Dissimilarities of DNA Sequences Based on Segment of Triplets
    Peng, Hui
    Wang, Lei
    Zheng, Jinhua
    JOURNAL OF COMPUTATIONAL AND THEORETICAL NANOSCIENCE, 2015, 12 (09) : 2601 - 2604
  • [26] Searching for unique DNA sequences with the Burrows-Wheeler Transform
    Pokrzywa, Rafal
    BIOCYBERNETICS AND BIOMEDICAL ENGINEERING, 2008, 28 (01) : 95 - 104
  • [27] Searching microsatellites in DNA sequences: Approaches used and tools developed
    Grover A.
    Aishwarya V.
    Sharma P.C.
    Physiology and Molecular Biology of Plants, 2012, 18 (1) : 11 - 19
  • [28] DNA databases
    Schuster, A
    BIOSYSTEMS, 2005, 81 (03) : 234 - 246
  • [29] STRATEGY FOR SEARCHING RELATED PROTEIN SEQUENCES IN DATABASES
    KOPKE, AKE
    WITTMANNLIEBOLD, B
    JOURNAL OF PROTEIN CHEMISTRY, 1988, 7 (03): : 254 - 255
  • [30] DNA databases
    Guz, Savannah Schroll
    LIBRARY JOURNAL, 2007, 132 (20) : 154 - 154