Data mining for motifs in DNA sequences

被引:0
|
作者
Bell, DA [1 ]
Guan, JW
机构
[1] Queens Univ Belfast, Sch Comp Sci, Belfast BT7 1NN, Antrim, North Ireland
[2] Jilin Univ, Coll Comp Sci & Technol, Changchun 130012, Peoples R China
关键词
rough sets; data mining; knowledge discovery in databases; gene expression; bioinformatics;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the large collections of genomic information accumulated in recent years there is potentially significant knowledge for exploitation in medicine and in the pharmaceutical industry. One interesting approach to the distillation of such knowledge is to detect strings in DNA sequences which are very repetitive within a given sequence (eg for a particular patient) or across sequences (eg from different patients who have been classified in some way eg as sharing a particular medical diagnosis). Motifs are strings that occur relatively frequently. In this paper we present basic theory and algorithms for finding such frequent and common strings. We are particularly interested in strings which are maximally frequent and, having discovered very frequent motifs we show how to mine association rules by an existing rough sets based technique. Further work and applications are in process. Keywords: Rough Sets, Data Mining, Knowledge Discovery in Databases, Gene Expression, Bioinformatics.
引用
收藏
页码:507 / 514
页数:8
相关论文
共 50 条
  • [1] Mining protein sequences for motifs
    Narasimhan, G
    Bu, CS
    Gao, YA
    Wang, XI
    Xu, N
    Mathee, K
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (05) : 707 - 720
  • [2] Discovering motifs in DNA sequences
    Guan, JW
    Liu, DY
    Bell, DA
    [J]. FUNDAMENTA INFORMATICAE, 2004, 59 (2-3) : 119 - 134
  • [3] Data Mining on DNA Sequences of Hepatitis B Virus
    Leung, Kwong-Sak
    Lee, Kin Hong
    Wang, Jin-Feng
    Ng, Eddie Y. T.
    Chan, Henry L. Y.
    Tsui, Stephen K. W.
    Mok, Tony S. K.
    Tse, Pete Chi-Hang
    Sung, Joseph Jao-Yiu
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2011, 8 (02) : 428 - 440
  • [4] A Frequent Pattern Mining Method for Finding Planted Motifs of Unknown Length in DNA Sequences
    Jia, Caiyan
    Lu, Ruqian
    Chen, Lusheng
    [J]. INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2011, 4 (05) : 1032 - 1041
  • [5] A Frequent Pattern Mining Method for Finding Planted Motifs of Unknown Length in DNA Sequences
    Jia C.
    Lu R.
    Chen L.
    [J]. International Journal of Computational Intelligence Systems, 2011, 4 (5) : 1032 - 1041
  • [6] An Automaton for Motifs Recognition in DNA Sequences
    Perez, Gerardo
    Mejia, Yuridia P.
    Olmos, Ivan
    Gonzalez, Jesus A.
    Sanchez, Patricia
    Vazquez, Candelario
    [J]. MICAI 2009: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2009, 5845 : 556 - +
  • [7] Detecting seeded motifs in DNA sequences
    Pizzi, C
    Bortoluzzi, S
    Bisognin, A
    Coppe, A
    Danieli, GA
    [J]. NUCLEIC ACIDS RESEARCH, 2005, 33 (15) : 1 - 8
  • [8] Discovering Motifs in DNA Sequences: A Candidate Motifs Based Approach
    Jain, Abhinav
    Parashar, Rajat
    Goyal, Ashish Kumar
    Biswas, Prantik
    Dawn, Suma
    Nanda, Aparajita
    [J]. 2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC), 2018, : 599 - 604
  • [9] WildSpan: mining structured motifs from protein sequences
    Chen-Ming Hsu
    Chien-Yu Chen
    Baw-Jhiune Liu
    [J]. Algorithms for Molecular Biology, 6
  • [10] WildSpan: mining structured motifs from protein sequences
    Hsu, Chen-Ming
    Chen, Chien-Yu
    Liu, Baw-Jhiune
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2011, 6