An exact data mining method for finding center strings and all their instances

被引:7
|
作者
Lu, Ruqian [1 ]
Jia, Caiyan
Zhang, Shaofang
Chen, Lusheng
Zhang, Hongyu
机构
[1] Acad Sinica, Inst Math, AMSS, Beijing 100080, Peoples R China
[2] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[3] Acad Sinica, Inst Comp Technol, Beijing 100080, Peoples R China
基金
中国国家自然科学基金;
关键词
data mining; frequent pattern; common approximate substring; center string; Bpriori algorithm;
D O I
10.1109/TKDE.2007.1001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Common substring problems allowing errors are known to be NP-hard. The main challenge of the problems lies in the combinatorial explosion of potential candidates. In this paper, we propose and study a Generalized Center String (GCS) problem, where not only all models (center strings) of any length, but also the positions of all their (degenerative) instances in input sequences are searched for. Inspired by frequent pattern mining techniques in data mining field, we present an exact and efficient method to solve GCS. First, a highly parallelized TRIE-like structure, consensus tree, is proposed. Based on this structure, we present three Bpriori algorithms step by step. Bpriori algorithms can solve GCS with reasonable time and/or space complexities. We have proved that GCS is fixed parameter tractable with respect to fixed symbol set size and fixed length of input sequences. Experiment results on both artificial and real data have shown the correctness of the algorithms and the validity of our complexity analysis. A comparison with some current algorithms for solving Common Approximate Substring problems is also given.
引用
收藏
页码:509 / 522
页数:14
相关论文
共 50 条
  • [1] An all closed set finding algorithm for data mining
    Kuusik, Rein
    Lind, Grete
    ADVANCES ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES, PROCEEDINGS, 2008, : 135 - +
  • [2] Finding All Solutions and Instances of Numberlink and Slitherlink by ZDDs
    Yoshinaka, Ryo
    Saitoh, Toshiki
    Kawahara, Jun
    Tsuruma, Koji
    Iwashita, Hiroaki
    Minato, Shin-ichi
    ALGORITHMS, 2012, 5 (02): : 176 - 213
  • [3] Contrast data mining for the MSSM from strings
    Parr, Erik
    Vaudrevange, Patrick K. S.
    NUCLEAR PHYSICS B, 2020, 952
  • [4] Finding the Optimal Bus-Routes Based on Data Mining Method
    Wang, Yong
    Liu, Yang
    Zhang, Cheng-zhi
    Li, Zhi-ping
    INTELLIGENT COMPUTING THEORIES, 2013, 7995 : 39 - 46
  • [5] Mining for data (Finding useful data, technology)
    Sipes, James L.
    LANDSCAPE ARCHITECTURE, 2006, 96 (10): : 126 - +
  • [6] A Comparison of Statistical and Data Mining Techniques for Enrichment Ontology with Instances
    Imsombut, Aurawan
    Kajornrit, Jesada
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ECONOMICS, FINANCE AND STATISTICS (ICEFS 2017), 2017, 26 : 408 - 413
  • [7] A data mining method for imbalanced datasets based on one-sided link and distribution density of instances
    Zhai, Yun
    Wang, Shu-Peng
    Ma, Nan
    Yang, Bing-Ru
    Zhang, De-Zheng
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2014, 42 (07): : 1311 - 1319
  • [8] Various Mining Techniques Defined For Mining Product Valuation Instances In Market Basket Data
    Chavan, Gaurav
    Samal, Twinkle
    Palivela, Hemant
    Gaikwad, Nikhil
    Sonule, Avinash
    2014 INTERNATIONAL CONFERENCE ON GREEN COMPUTING COMMUNICATION AND ELECTRICAL ENGINEERING (ICGCCEE), 2014,
  • [9] Subtractive clustering analysis: A novel data mining method for finding cell subpopulations
    Smith, JN
    Prow, T
    Reece, L
    Fontenot, A
    Salazar, E
    Wang, N
    Rose, W
    Szaniszlo, P
    Leary, JF
    CYTOMETRY PART A, 2004, 59A (01): : 140 - 140
  • [10] Subtractive clustering analysis: A novel data mining method for finding cell subpopulations
    Smith, JN
    Reece, L
    Szaniszlo, P
    Leary
    Leary, RC
    Leary, JF
    Imaging, Manipulation, and Analysis of Biomolecules and Cells: Fundamentals and Applications III, 2005, 5699 : 354 - 361