Mining Contiguous Sequential Generators in Biological Sequences

被引:19
|
作者
Zhang, Jingsong [1 ]
Wang, Yinglin [2 ]
Zhang, Chao [3 ]
Shi, Yongyong [4 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Shanghai Univ Finance & Econ, Dept Comp Sci & Technol, Shanghai 200433, Peoples R China
[3] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[4] Shanghai Jiao Tong Univ, BioX Inst, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
Sequential pattern mining; closed sequential pattern; sequential generator; contiguous sequential generator; DNA sequence; protein sequence; motif finding; EFFICIENT ALGORITHMS; MOTIF ANALYSIS; PATTERNS;
D O I
10.1109/TCBB.2015.2495132
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The discovery of conserved sequential patterns in biological sequences is essential to unveiling common shared functions. Mining sequential generators as well as mining closed sequential patterns can contribute to a more concise result set than mining all sequential patterns, especially in the analysis of big data in bioinformatics. Previous studies have also presented convincing arguments that the generator is preferable to the closed pattern in inductive inference and classification. However, classic sequential generator mining algorithms, due to the lack of consideration on the contiguous constraint along with the lower-closed one, still pose a great challenge at spawning a large number of inefficient and redundant patterns, which is too huge for effective usage. Driven by some extensive applications of patterns with contiguous feature, we propose ConSgen, an efficient algorithm for discovering contiguous sequential generators. It adopts the n-gram model, called shingles, to generate potential frequent subsequences and leverages several pruning techniques to prune the unpromising parts of search space. And then, the contiguous sequential generators are identified by using the equivalence class-based lower-closure checking scheme. Our experiments on both DNA and protein data sets demonstrate the compactness, efficiency, and scalability of ConSgen.
引用
收藏
页码:855 / 867
页数:13
相关论文
共 50 条
  • [1] Mining frequent contiguous sequence patterns in biological sequences
    Kang, Tae Ho
    Yoo, Jae Soo
    Kim, Hak Yong
    [J]. PROCEEDINGS OF THE 7TH IEEE INTERNATIONAL SYMPOSIUM ON BIOINFORMATICS AND BIOENGINEERING, VOLS I AND II, 2007, : 723 - +
  • [2] Mining Interesting and Contiguous Maximal Sequential Patterns on High Dimensional Sequences
    Ding, Jian
    Han, Meng
    [J]. 2013 FIFTH INTERNATIONAL CONFERENCE ON MEASURING TECHNOLOGY AND MECHATRONICS AUTOMATION (ICMTMA 2013), 2013, : 691 - 694
  • [3] Frequent contiguous pattern mining over biological sequences of protein misfolded diseases
    Mohammad Shahedul Islam
    Md. Abul Kashem Mia
    Mohammad Shamsur Rahman
    Mohammad Shamsul Arefin
    Pranab Kumar Dhar
    Takeshi Koshiba
    [J]. BMC Bioinformatics, 22
  • [4] Frequent contiguous pattern mining over biological sequences of protein misfolded diseases
    Islam, Mohammad Shahedul
    Mia, Md Abul Kashem
    Rahman, Mohammad Shamsur
    Arefin, Mohammad Shamsul
    Dhar, Pranab Kumar
    Koshiba, Takeshi
    [J]. BMC BIOINFORMATICS, 2021, 22 (01)
  • [5] Targeted mining of contiguous sequential patterns
    Hu, Kaixia
    Gan, Wensheng
    Huang, Shan
    Peng, Hao
    Fournier-Viger, Philippe
    [J]. INFORMATION SCIENCES, 2024, 653
  • [6] Efficient mining gapped sequential patterns for motifs in biological sequences
    Liao, Vance Chiang-Chi
    Chen, Ming-Syan
    [J]. BMC SYSTEMS BIOLOGY, 2013, 7
  • [7] A Fast Contiguous Sequential Pattern Mining Technique in DNA Data Sequences Using Position Information
    Zerin, Syeda Farzana
    Jeong, Byeong-Soo
    [J]. IETE TECHNICAL REVIEW, 2011, 28 (06) : 511 - 519
  • [8] CCSpan: Mining closed contiguous sequential patterns
    Zhang, Jingsong
    Wang, Yinglin
    Yang, Dingyu
    [J]. KNOWLEDGE-BASED SYSTEMS, 2015, 89 : 1 - 13
  • [9] An effective algorithm for mining sequential generators
    Yi, Shengwei
    Zhao, Tianheng
    Zhang, Yuanyuan
    Ma, Shilong
    Che, Zhanbin
    [J]. CEIS 2011, 2011, 15
  • [10] A Two Stage Approach for Contiguous Sequential Pattern Mining
    Chen, Jinlin
    Shankar, Subash
    Kelly, Angela
    Gningue, Sergine
    Rajaravivarma, Rathika
    [J]. PROCEEDINGS OF THE 2009 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 382 - +