LOCATING PROTEIN-CODING REGIONS IN HUMAN DNA-SEQUENCES BY A MULTIPLE SENSOR NEURAL NETWORK APPROACH

被引:547
|
作者
UBERBACHER, EC
MURAL, RJ
机构
[1] OAK RIDGE NATL LAB, DIV ENGN PHYS & MATH, OAK RIDGE, TN 37831 USA
[2] UNIV TENNESSEE, OAK RIDGE GRAD SCH BIOMED SCI, OAK RIDGE, TN 37830 USA
关键词
CODING EXON LOCALIZATION; GENE STRUCTURE; PATTERN RECOGNITION; DNA SEQUENCE ANALYSIS;
D O I
10.1073/pnas.88.24.11261
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Genes in higher eukaryotes may span tens or hundreds of kilobases with the protein-coding regions accounting for only a few percent of the total sequence. Identifying genes within large regions of uncharacterized DNA is a difficult undertaking and is currently the focus of many research efforts. We describe a reliable computational approach for locating protein-coding portions of genes in anonymous DNA sequence. Using a concept suggested by robotic environmental sensing, our method combines a set of sensor algorithms and a neural network to localize the coding regions. Several algorithms that report local characteristics of the DNA sequence, and therefore act as sensors, are also described. In its current configuration the "coding recognition module" identifies 90% of coding exons of length 100 bases or greater with less than one false positive coding exon indicated per five coding exons indicated. This is a significantly lower false positive rate than any method of which we are aware. This module demonstrates a method with general applicability to sequence-pattern recognition problems and is available for current research efforts.
引用
收藏
页码:11261 / 11265
页数:5
相关论文
共 50 条
  • [1] ON THE ORIGIN OF THE PERIODICITY OF 3 IN PROTEIN-CODING DNA-SEQUENCES
    GUTIERREZ, G
    OLIVER, JL
    MARIN, A
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 1994, 167 (04) : 413 - 414
  • [2] THE STATISTICAL CORRELATION OF NUCLEOTIDES IN PROTEIN-CODING DNA-SEQUENCES
    LUO, LF
    HONG, L
    [J]. BULLETIN OF MATHEMATICAL BIOLOGY, 1991, 53 (03) : 345 - 353
  • [3] RECOGNITION OF PROTEIN CODING REGIONS IN DNA-SEQUENCES
    FICKETT, JW
    [J]. NUCLEIC ACIDS RESEARCH, 1982, 10 (17) : 5303 - 5318
  • [4] FISH - A GUIDE TO PROTEIN-CODING DNA-SEQUENCES IN THE GENBANK DATABASE
    COLLINS, DW
    [J]. COMPUTER APPLICATIONS IN THE BIOSCIENCES, 1993, 9 (03): : 337 - 342
  • [5] CORRELATION APPROACH TO IDENTIFY CODING REGIONS IN DNA-SEQUENCES
    OSSADNIK, SM
    BULDYREV, SV
    GOLDBERGER, AL
    HAVLIN, S
    MANTEGNA, RN
    PENG, CK
    SIMONS, M
    STANLEY, HE
    [J]. BIOPHYSICAL JOURNAL, 1994, 67 (01) : 64 - 70
  • [6] Prediction of protein-coding regions in DNA sequences using a model-based approach
    Kakumani, Rajasekhar
    Devabhaktuni, Vijay
    Ahmad, M. Omair
    [J]. PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10, 2008, : 1918 - 1921
  • [7] Fourier-Based Filtering Approach for Identification of Protein-Coding Regions in DNA Sequences
    Das, Bihter
    Turkoglu, Ibrahim
    [J]. 2015 23RD SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2015, : 2529 - 2532
  • [8] A 3-DIMENSIONAL REPRESENTATION FOR BASE COMPOSITION OF PROTEIN-CODING DNA-SEQUENCES
    ROWE, GW
    [J]. JOURNAL OF THEORETICAL BIOLOGY, 1985, 112 (02) : 433 - 444
  • [9] CODON-BASED MODEL OF NUCLEOTIDE SUBSTITUTION FOR PROTEIN-CODING DNA-SEQUENCES
    GOLDMAN, N
    YANG, ZH
    [J]. MOLECULAR BIOLOGY AND EVOLUTION, 1994, 11 (05) : 725 - 736
  • [10] Identification of Protein-Coding Regions in DNA Sequences Using A Time-Frequency Filtering Approach
    Sitanshu Sekhar Sahu
    Ganapati Panda
    [J]. Genomics,Proteomics & Bioinformatics, 2011, (Z1) : 45 - 55