Mining loosely structured motifs from biological data

被引:14
|
作者
Fassetti, Fabio [1 ]
Greco, Gianluigi [2 ]
Terracina, Giorgio [2 ]
机构
[1] Univ Calabria, Dipartimento Elettron Informat & Sistemist, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, Dipartimento Matemat, I-87036 Arcavacata Di Rende, CS, Italy
关键词
data mining; bioinformatics (genome or protein) databases; mining methods and algorithms;
D O I
10.1109/TKDE.2008.65
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The discovery of information encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually encoded in patterns frequently occurring in the sequences, which are also called motifs. In fact, motif discovery has received much attention in the literature, and several algorithms have already been proposed, which are specifically tailored to deal with motifs exhibiting some kinds of "regular structure." Motivated by biological observations, this paper focuses on the mining of loosely structured motifs, i.e., of more general kinds of motif where several "exceptions" may be tolerated in pattern repetitions. To this end, an algorithm exploiting data structures conceived to efficiently handle pattern variabilities is presented and analyzed. Furthermore, a randomized variant with linear time and space complexity is introduced, and a theoretical guarantee on its performances is proven. Both algorithms have been implemented and tested on real data sets. Despite the ability of mining very complex kinds of pattern, performance results evidence a genome-wide applicability of the proposed techniques.
引用
收藏
页码:1472 / 1489
页数:18
相关论文
共 50 条
  • [1] L-SME: A System for Mining Loosely Structured Motifs
    Fassetti, Fabio
    Greco, Gianluigi
    Terracina, Giorgio
    [J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT III, 2011, 6913 : 621 - 625
  • [2] Extracting loosely structured data records through mining strict patterns
    Wu, Yipu
    Chen, Jing
    Li, Qing
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1322 - +
  • [3] WildSpan: mining structured motifs from protein sequences
    Chen-Ming Hsu
    Chien-Yu Chen
    Baw-Jhiune Liu
    [J]. Algorithms for Molecular Biology, 6
  • [4] WildSpan: mining structured motifs from protein sequences
    Hsu, Chen-Ming
    Chen, Chien-Yu
    Liu, Baw-Jhiune
    [J]. ALGORITHMS FOR MOLECULAR BIOLOGY, 2011, 6
  • [5] Title extraction from Loosely Structured Data Records
    Wu, Yi-Pu
    Zhang, Xue-Jie
    Li, Qing
    Chen, Jing
    [J]. PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 2623 - +
  • [6] Mining chemical and biological data for trends: Visualizing structured numeric data from ELNs
    Skinner, Philip J.
    McHale, Phil
    Kallmerton, Amy
    Schoenberg, Megean
    Khimani, Anis
    Blanchard, Kate
    Swartz, Michael
    [J]. ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2013, 245
  • [7] Mining Structured Data
    Da San Martino, Giovanni
    Sperduti, Alessandro
    [J]. IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2010, 5 (01) : 42 - 49
  • [8] Data mining for motifs in DNA sequences
    Bell, DA
    Guan, JW
    [J]. ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, 2003, 2639 : 507 - 514
  • [9] Mining Association Rules from Structured XML data
    Abazeed, Ashraf
    Mamat, Ali
    Nasir, Mohmd
    Ibrahim, Hamidah
    [J]. 2009 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS, VOLS 1 AND 2, 2009, : 365 - 368
  • [10] Closed Structured Patterns and Motifs Mining without Candidate Maintenance
    Yan, Leiming
    Sun, Zhihui
    [J]. 2009 SECOND INTERNATIONAL CONFERENCE ON FUTURE INFORMATION TECHNOLOGY AND MANAGEMENT ENGINEERING, FITME 2009, 2009, : 490 - 493