Novel methodologies for spectral classification of exon and intron sequences

被引:14
|
作者
Kwan, Hon Keung [1 ]
Kwan, Benjamin Y. M. [2 ]
Kwan, Jennifer Y. Y. [3 ]
机构
[1] Univ Windsor, Dept Elect & Comp Engn, Windsor, ON N9B 3P4, Canada
[2] Univ Ottawa, Fac Med, Ottawa, ON K1H 8M5, Canada
[3] Queens Univ, Sch Med, Kingston, ON K7L 3N6, Canada
关键词
DNA sequence; numerical representation; nucleotide to numeric mapping; exon and intron sequences; coding and non-coding sequences; threshold value; thresholding; exon and intron classification; period-3; spectral analysis; discrete Fourier transform; gene detection; genome annotation; DNA; REPRESENTATION; PREDICTION; GALAXY;
D O I
10.1186/1687-6180-2012-50
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Novel methodologies for spectral classification of exon and intron sequences
    Hon Keung Kwan
    Benjamin Y M Kwan
    Jennifer Y Y Kwan
    EURASIP Journal on Advances in Signal Processing, 2012
  • [2] Spectral classification of short numerical exon and intron sequences
    Kwan, Benjamin Y. M.
    Kwan, Jennifer Y. Y.
    Kwan, Hon Keung
    BMC BIOINFORMATICS, 2011, 12 : 1 - 2
  • [3] Spectral classification of short numerical exon and intron sequences
    Benjamin YM Kwan
    Jennifer YY Kwan
    Hon Keung Kwan
    BMC Bioinformatics, 12
  • [4] Spectral Analysis of Numerical Exon and Intron Sequences
    Kwan, Jennifer Yin Yee
    Kwan, Benjamin Yin Ming
    Kwan, Hon Keung
    2010 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE WORKSHOPS (BIBMW), 2010, : 876 - 877
  • [5] Spectral Techniques for Classifying Short Exon and Intron Sequences
    Kwan, Benjamin Y. M.
    Kwan, Jennifer Y. Y.
    Kwan, Hon Keung
    2012 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS 2012), 2012, : 568 - 571
  • [6] BioSPRINT: Classification of intron and exon sequences using the SPRINT algorithm
    Crosby, K
    Gabbert, P
    2004 IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE, PROCEEDINGS, 2004, : 668 - 669
  • [7] FRACTAL DIMENSION OF EXON AND INTRON SEQUENCES
    XIAO, Y
    CHEN, RS
    SHEN, RQ
    SUN, J
    XU, J
    JOURNAL OF THEORETICAL BIOLOGY, 1995, 175 (01) : 23 - 26
  • [8] Classification of Exon and Intron Regions on DNA Sequences with Hybrid Use of SBERT and ANFIS Approaches
    Akalin, Fatma
    Yumusak, Nejat
    JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, 2024, 27 (03):
  • [9] Parallel cascade recognition of exon and intron DNA sequences
    Korenberg, MJ
    Lipson, ED
    Green, JR
    Solomon, JE
    ANNALS OF BIOMEDICAL ENGINEERING, 2002, 30 (01) : 129 - 140
  • [10] Parallel Cascade Recognition of Exon and Intron DNA Sequences
    Michael J. Korenberg
    Edward D. Lipson
    James R. Green
    Jerry E. Solomon
    Annals of Biomedical Engineering, 2002, 30 : 129 - 140