An optimal DNA segmentation based on the MDL principle

被引:6
|
作者
Szpankowski, W [1 ]
Ren, WH [1 ]
Szpankowski, L [1 ]
机构
[1] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
关键词
D O I
10.1109/CSB.2003.1227402
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The biological world is highly stochastic as well as inhomogeneous in its behavior The transition between homogeneous and inhomogeneous regions of DNA, known also as change points, carry important biological information. Our goal is to employ rigorous methods of information theory to quantify structural properties of DNA sequences. In particular, we adopt the Stein-Ziv lemma to find asymptotically optimal discriminant function that determines whether two DNA segments are generated by the same source and assuring exponentially small false positives. Then we apply the Minimum Description Length (MDL) principle to select parameters of our segmentation algorithm. Finally, we perform extensive experimental work on human chromosome 9. After grouping A and G (purines) and Tand C (pyrimidines) we discover change points between coding and noncoding regions as well as the beginning of a CpG island.
引用
收藏
页码:541 / 546
页数:6
相关论文
共 50 条
  • [31] SIGNAL SEGMENTATION AND MODELLING BASED ON EQUIPARTITION PRINCIPLE
    Panagiotakis, Costas
    Tziritas, Georgios
    2009 16TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, VOLS 1 AND 2, 2009, : 1153 - 1158
  • [32] Irreducible form for AP algorithm for detecting the number of coherent signals based on the MDL principle
    Suzuki, M
    Sanada, H
    Nagai, N
    2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 2865 - 2868
  • [33] MULTI-REGULARIZATION PARAMETERS ESTIMATION FOR GAUSSIAN MIXTURE CLASSIFIER BASED ON MDL PRINCIPLE
    Zhou, Xiuling
    Guo, Ping
    Chen, C. L. Philip
    NCTA 2011: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON NEURAL COMPUTATION THEORY AND APPLICATIONS, 2011, : 112 - 117
  • [34] Constraint-based MDL principle for Semi-Supervised Classification of Time Series
    Vo Thanh Vinh
    Duong Tuan Anh
    2015 SEVENTH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2015, : 43 - 48
  • [35] FMRI baseline drift estimation method by MDL principle
    Bazargani, Negar
    Nosratinia, Aria
    Gopinath, Kaundinya
    Briggs, Richard W.
    2007 4TH IEEE INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING : MACRO TO NANO, VOLS 1-3, 2007, : 472 - +
  • [36] Ship speed optimization method combining Fisher optimal segmentation principle
    Li, Xiaohe
    Sun, Baozhi
    Jin, Jianhai
    Ding, Jun
    APPLIED OCEAN RESEARCH, 2023, 140
  • [37] A Novel Clustering-Based 1-NN Classification of Time Series Based on MDL Principle
    Vo Thanh Vinh
    Duong Tuan Anh
    RECENT DEVELOPMENTS IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2016, 642 : 29 - 40
  • [38] On the Performance Improvement of Microwave Imaging Using MDL Principle
    Ravan, Mohammad
    Nakhkash, Mansor
    Abouei, Jamshid
    2012 SIXTH INTERNATIONAL SYMPOSIUM ON TELECOMMUNICATIONS (IST), 2012, : 348 - 353
  • [39] Vouw: Geometric Pattern Mining Using the MDL Principle
    Faas, Micky
    van Leeuwen, Matthijs
    ADVANCES IN INTELLIGENT DATA ANALYSIS XVIII, IDA 2020, 2020, 12080 : 158 - 170
  • [40] Generalizing case frames using a thesaurus and the MDL principle
    Li, H
    Abe, N
    COMPUTATIONAL LINGUISTICS, 1998, 24 (02) : 217 - 244