Searching DNA databases for similarities to DNA sequences: when is a match significant?

被引：73

作者：

Anderson, I ^{[1
]}

Brass, A ^{[1
]}

机构：

[1] Univ Manchester, Sch Biol Sci, Manchester M13 9PT, Lancs, England

来源：

BIOINFORMATICS | 1998年 / 14卷 / 04期

关键词：

D O I：

10.1093/bioinformatics/14.4.349

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Motivation: Searching DNA sequences against a DNA database is an essential element of sequence analysis. However, few systematic studies have been carried out to deter-mine when a match between two DNA sequences has biological significance and this is limiting the use that can be made of DNA searching algorithms. Results: A rest set of DNA sequences has been constructed consisting of artificially evolved and real sequences. This set has been used to test various database searching algorithms (BLAST, BLAST2, FASTA and Smith-Waterman) on a subset of the EMBL database. The results of this analysis have been used to determine the sensitivity and coverage of all of the algorithms. Guidelines have been produced which can be used to assess the significance of DNA database search results. The Smith-Water-man algorithm was shown to have the best coverage, but the wet-st sensitivity, whereas the default BLASTN algorithm (word length set to 11) was shown to have good sensitivity, but poor coverage. A sensible compromise between speed, sensitivity and coverage can be obtained using either the FASTA or BLAST (word length set to 6) algorithms. However; analysis of the results also showed that no algorithm works well when the length of the probe sequence is <200 bases. In general, matches can accurately be identified between coding regions of DNA sequences when there is >35% sequence identity between the corresponding proteins. Searching a DNA sequence against a DNA sequence database can, therefore, be a useful tool in sequence analysis.

引用

页码：349 / 356

页数：8

共 50 条

[1] MATCH™:: a tool for searching transcription factor binding sites in DNA sequences
Kel, AE
Gössling, E
Reuter, I
Cheremushkin, E
Kel-Margoulis, OV
Wingender, E
NUCLEIC ACIDS RESEARCH, 2003, 31 (13) : 3576 - 3579
[2] Trawling Genetic Databases: When a DNA Match is Just a Naked Statistic
Scurich, Nicholas
John, Richard S.
JOURNAL OF EMPIRICAL LEGAL STUDIES, 2011, 8 : 49 - 71
[3] Submitting DNA sequences to the databases
Kans, JA
Ouellette, BFF
BIOINFORMATICS, 1998, 39 : 319 - 353
[4] DNA codes based on stem similarities between DNA sequences
D'yachkov, Arkadii
Macula, Anthony
Rykov, Vyacheslav
Ufimtsev, Vladimir
DNA COMPUTING, 2008, 4848 : 146 - +
[5] SEARCHING FOR DNA-SEQUENCES IN A FLASH
GEAKE, E
NEW SCIENTIST, 1993, 139 (1885) : 20 - 20
[6] ON THE STATISTICAL ASSESSMENT OF SIMILARITIES IN DNA-SEQUENCES
REICH, JG
DRABSCH, H
DAUMLER, A
NUCLEIC ACIDS RESEARCH, 1984, 12 (13) : 5529 - 5543
[7] Searching trademark databases for verbal similarities
Fall, C. J.
Giraud-Carrier, C.
WORLD PATENT INFORMATION, 2005, 27 (02) : 135 - 143
[8] MICA: desktop software for comprehensive searching of DNA databases
William A Stokes
Benjamin S Glick
BMC Bioinformatics, 7
[9] Forensic investigation approaches of searching relatives in DNA databases
Ge, Jianye
Budowle, Bruce
JOURNAL OF FORENSIC SCIENCES, 2021, 66 (02): : 430 - 443
[10] MICA: desktop software for comprehensive searching of DNA databases
Stokes, William A.
Glick, Benjamin S.
BMC BIOINFORMATICS, 2006, 7 (1)

← 1 2 3 4 5 →