Mining loosely structured motifs from biological data

被引:14
|
作者
Fassetti, Fabio [1 ]
Greco, Gianluigi [2 ]
Terracina, Giorgio [2 ]
机构
[1] Univ Calabria, Dipartimento Elettron Informat & Sistemist, I-87036 Arcavacata Di Rende, CS, Italy
[2] Univ Calabria, Dipartimento Matemat, I-87036 Arcavacata Di Rende, CS, Italy
关键词
data mining; bioinformatics (genome or protein) databases; mining methods and algorithms;
D O I
10.1109/TKDE.2008.65
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The discovery of information encoded in biological sequences is assuming a prominent role in identifying genetic diseases and in deciphering biological mechanisms. This information is usually encoded in patterns frequently occurring in the sequences, which are also called motifs. In fact, motif discovery has received much attention in the literature, and several algorithms have already been proposed, which are specifically tailored to deal with motifs exhibiting some kinds of "regular structure." Motivated by biological observations, this paper focuses on the mining of loosely structured motifs, i.e., of more general kinds of motif where several "exceptions" may be tolerated in pattern repetitions. To this end, an algorithm exploiting data structures conceived to efficiently handle pattern variabilities is presented and analyzed. Furthermore, a randomized variant with linear time and space complexity is introduced, and a theoretical guarantee on its performances is proven. Both algorithms have been implemented and tested on real data sets. Despite the ability of mining very complex kinds of pattern, performance results evidence a genome-wide applicability of the proposed techniques.
引用
收藏
页码:1472 / 1489
页数:18
相关论文
共 50 条
  • [31] Data mining and the evolution of biological complexity
    Davnah Urbach
    Jason H Moore
    [J]. BioData Mining, 4
  • [32] The spatial dimension in biological data mining
    Davnah Urbach
    Jason H Moore
    [J]. BioData Mining, 4
  • [33] The spatial dimension in biological data mining
    Urbach, Davnah
    Moore, Jason H.
    [J]. BIODATA MINING, 2011, 4
  • [34] Deep Learning in Mining Biological Data
    Mufti Mahmud
    M. Shamim Kaiser
    T. Martin McGinnity
    Amir Hussain
    [J]. Cognitive Computation, 2021, 13 : 1 - 33
  • [35] INFORMATION SEARCH STRATEGIES IN LOOSELY STRUCTURED SETTINGS
    CHANG, CK
    MCDANIEL, ED
    [J]. JOURNAL OF EDUCATIONAL COMPUTING RESEARCH, 1995, 12 (01) : 95 - 107
  • [36] Mining pathway signatures from microarray data and relevant biological knowledge
    Panteris, Eleftherios
    Swift, Stephen
    Payne, Annette
    Liu, Xiaohui
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2007, 40 (06) : 698 - 706
  • [37] AN ON LINE SYSTEM FOR PROCESSING LOOSELY STRUCTURED RECORDS
    DOBBERT, GA
    [J]. HISTORICAL METHODS, 1982, 15 (01): : 16 - 22
  • [38] A polynomial time matching algorithm of structured ordered tree patterns for data mining from semistructured data
    Suzuki, Y
    Inomae, K
    Shoudai, T
    Miyahara, T
    Uchida, T
    [J]. INDUCTIVE LOGIC PROGRAMMING, 2003, 2583 : 270 - 284
  • [39] Algorithm for classification of biological data based on data mining
    Garcia, Eduardo Moniz
    Fonseca, Simone A. S.
    Beingolea, Jorge R.
    [J]. PROCEEDINGS OF THE 2019 IEEE 1ST SUSTAINABLE CITIES LATIN AMERICA CONFERENCE (SCLA), 2019,
  • [40] Research on the Data Model and the Approaches to Data Mining in the Semi-structured Data
    Liu, Fenghua
    [J]. APPLIED SCIENCE, MATERIALS SCIENCE AND INFORMATION TECHNOLOGIES IN INDUSTRY, 2014, 513-517 : 663 - 666