An exact data mining method for finding center strings and all their instances

被引:7
|
作者
Lu, Ruqian [1 ]
Jia, Caiyan
Zhang, Shaofang
Chen, Lusheng
Zhang, Hongyu
机构
[1] Acad Sinica, Inst Math, AMSS, Beijing 100080, Peoples R China
[2] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[3] Acad Sinica, Inst Comp Technol, Beijing 100080, Peoples R China
基金
中国国家自然科学基金;
关键词
data mining; frequent pattern; common approximate substring; center string; Bpriori algorithm;
D O I
10.1109/TKDE.2007.1001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Common substring problems allowing errors are known to be NP-hard. The main challenge of the problems lies in the combinatorial explosion of potential candidates. In this paper, we propose and study a Generalized Center String (GCS) problem, where not only all models (center strings) of any length, but also the positions of all their (degenerative) instances in input sequences are searched for. Inspired by frequent pattern mining techniques in data mining field, we present an exact and efficient method to solve GCS. First, a highly parallelized TRIE-like structure, consensus tree, is proposed. Based on this structure, we present three Bpriori algorithms step by step. Bpriori algorithms can solve GCS with reasonable time and/or space complexities. We have proved that GCS is fixed parameter tractable with respect to fixed symbol set size and fixed length of input sequences. Experiment results on both artificial and real data have shown the correctness of the algorithms and the validity of our complexity analysis. A comparison with some current algorithms for solving Common Approximate Substring problems is also given.
引用
收藏
页码:509 / 522
页数:14
相关论文
共 50 条
  • [31] A method of finding the center and radius of the circumsphere for a pyramid
    Naziev, Aslanbek
    TEACHING OF MATHEMATICS, 2021, 24 (01): : 36 - 53
  • [32] A new method for finding the center of gravity of polygons
    Khorshidi, Behzad
    JOURNAL OF GEOMETRY, 2009, 96 (1-2) : 81 - 91
  • [33] The method and application of image data mining in multimedia data mining
    Zhang, H
    Guan, ZQ
    Nie, D
    ISTM/2005: 6th International Symposium on Test and Measurement, Vols 1-9, Conference Proceedings, 2005, : 6019 - 6023
  • [34] Efficient Utilization of Virtual Instances by Suspend Resume Strategy in Cloud Data Center
    Kumar, A. Nirmal
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2020, 79 (06): : 526 - 530
  • [35] One method for finding exact solutions of nonlinear differential equations
    Kudryashov, Nikolay A.
    COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2012, 17 (06) : 2248 - 2253
  • [36] New method of finding exact frequency response for feedback amplifiers
    Chen, Chung-Cheng
    Chen, Yen-Ting
    IET CIRCUITS DEVICES & SYSTEMS, 2020, 14 (06) : 819 - 829
  • [37] An exact method for finding shortest routes on a sphere, avoiding obstacles
    Washburn, Alan
    Brown, Gerald G.
    NAVAL RESEARCH LOGISTICS, 2016, 63 (05) : 374 - 385
  • [38] An efficient method for finding the exact solution of nonlinear evolution equations
    Zhao, XQ
    Tang, DB
    Shu, C
    MODERN PHYSICS LETTERS B, 2005, 19 (28-29): : 1703 - 1706
  • [39] Data structure for finding all intervals that overlap a point
    Chen, Yu-Lin
    Liu, Jian-Cheng
    Xiangtan Kuangye Xueyuan Xuebao/Journal of Xiangtan Mining Institute, 2002, 17 (03):
  • [40] IMPROVING THE PERFORMANCE OF EXACT APPROACH FOR PRIVACY PRESERVING IN DATA MINING
    LaMacchia, Carolyn
    2016 BAASANA INTERNATIONAL CONFERENCE PROCEEDINGS, 2016, : 95 - 106