Mathematical Algorithm for Identification of Eukaryotic Promoter Sequences

被引:6
|
作者
Korotkov, Eugene, V [1 ]
Suvorova, Yulia M. [1 ]
Nezhdanova, Anna, V [1 ]
Gaidukova, Sofia E. [1 ]
Yakovleva, Irina, V [1 ]
Kamionskaya, Anastasia M. [1 ]
Korotkova, Maria A. [2 ]
机构
[1] Russian Acad Sci, Inst Bioengn, Fed Res Ctr Biotechnol, Moscow 119071, Russia
[2] Natl Res Nucl Univ MEPhI, Inst Cyber Intelligence Syst, Moscow Engn Phys Inst, Moscow 115409, Russia
来源
SYMMETRY-BASEL | 2021年 / 13卷 / 06期
关键词
promoter; rice genome; dynamic programming; base correlation; TRANSCRIPTION; RECOGNITION; ANNOTATION; ELEMENTS; PROTEIN; GENOMES; GENES;
D O I
10.3390/sym13060917
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identification of promoter sequences in the eukaryotic genome, by computer methods, is an important task of bioinformatics. However, this problem has not been solved since the best algorithms have a false positive probability of 10(-3)-10(-4) per nucleotide. As a result of full genome analysis, there may be more false positives than annotated gene promoters. The probability of a false positive should be reduced to 10(-)(6)-10(-)(8) to reduce the number of false positives and increase the reliability of the prediction. The method for multi alignment of the promoter sequences was developed. Then, mathematical methods were developed for calculation of the statistically important classes of the promoter sequences. Five promoter classes, from the rice genome, were created. We developed promoter classes to search for potential promoter sequences in the rice genome with a false positive number less than 10(-)(8) per nucleotide. Five classes of promoter sequences contain 1740, 222, 199, 167 and 130 promoters, respectively. A total of 145,277 potential promoter sequences (PPSs) were identified. Of these, 18,563 are promoters of known genes, 87,233 PPSs intersect with transposable elements, and 37,390 PPSs were found in previously unannotated sequences. The number of false positives for a randomly mixed rice genome is less than 10(-)(8) per nucleotide. The method developed for detecting PPSs was compared with some previously used approaches. The developed mathematical method can be used to search for genes, transposable elements, and transcript start sites in eukaryotic genomes.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Repeated sequences in the promoter regions of eukaryotic genes
    Babenko, VN
    Kosarev, PS
    Basin, VV
    Frolov, AS
    BIOFIZIKA, 1999, 44 (04): : 664 - 667
  • [2] An efficient algorithm for the identification of structured motifs in DNA promoter sequences
    Carvalho, AM
    Freitas, AT
    Oliveira, AL
    Sagot, MF
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2006, 3 (02) : 126 - 140
  • [3] PROMOTER SEQUENCES OF EUKARYOTIC PROTEIN-CODING GENES
    CHAMBON, P
    HOPPE-SEYLERS ZEITSCHRIFT FUR PHYSIOLOGISCHE CHEMIE, 1981, 362 (04): : 381 - 381
  • [4] PROMOTER SEQUENCES OF EUKARYOTIC PROTEIN-CODING GENES
    CORDEN, J
    WASYLYK, B
    BUCHWALDER, A
    CORSI, PS
    KEDINGER, C
    CHAMBON, P
    SCIENCE, 1980, 209 (4463) : 1406 - 1414
  • [5] COMPILATION AND ANALYSIS OF EUKARYOTIC POL-II PROMOTER SEQUENCES
    BUCHER, P
    TRIFONOV, EN
    NUCLEIC ACIDS RESEARCH, 1986, 14 (24) : 10009 - 10026
  • [6] Identification of promoter sequences using archaeal genomic sequences
    Amano, N
    Tsuji, K
    Suzuki, M
    PROCEEDINGS OF THE JAPAN ACADEMY SERIES B-PHYSICAL AND BIOLOGICAL SCIENCES, 2003, 79 (05): : 131 - 136
  • [7] Identification of Campylobacter jejuni promoter sequences
    Wösten, MMSM
    Boeve, M
    Koot, MGA
    van Nuenen, AC
    van der Zeijst, BAM
    JOURNAL OF BACTERIOLOGY, 1998, 180 (03) : 594 - 599
  • [8] PROMOTER SEQUENCES OF EUKARYOTIC GENES TRANSCRIBED BY RNA POLYMERASE-B
    CHAMBON, P
    DNA-A JOURNAL OF MOLECULAR & CELLULAR BIOLOGY, 1981, 1 (01): : 72 - 73
  • [9] Functional Capacity of Shiga-Toxin Promoter Sequences in Eukaryotic Cells
    Bentancor, Leticia V.
    Bilen, Marcos F.
    Mejias, Maria P.
    Fernandez-Brando, Romina J.
    Panek, Cecilia A.
    Ramos, Maria V.
    Fernandez, Gabriela C.
    Isturiz, Martin
    Ghiringhelli, Pablo D.
    Palermo, Marina S.
    PLOS ONE, 2013, 8 (02):
  • [10] Identification of Protein Coding Regions in the Eukaryotic DNA Sequences Based on Marple Algorithm and Wavelet Packets Transform
    Liu, Guangchen
    Luan, Yihui
    ABSTRACT AND APPLIED ANALYSIS, 2014,