A Proposition for Sequence Mining Using Pattern Structures

被引:6
|
作者
Codocedo, Victor [1 ,3 ]
Bosc, Guillaume [2 ]
Kaytoue, Mehdi [2 ]
Boulicaut, Jean-Francois [2 ]
Napoli, Amedeo [3 ]
机构
[1] Inria Chile, Las Condes, Chile
[2] Univ Lyon, CNRS, INSA Lyon, LIRIS, Lyon, France
[3] Univ Lorraine, INRIA Nancy Grand Est, CNRS, LORIA, Nancy, France
来源
关键词
D O I
10.1007/978-3-319-59271-8_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this article we present a novel approach to rare sequence mining using pattern structures. Particularly, we are interested in mining closed sequences, a type of maximal sub-element which allows providing a succinct description of the patterns in a sequence database. We present and describe a sequence pattern structure model in which rare closed subsequences can be easily encoded. We also propose a discussion and characterization of the search space of closed sequences and, through the notion of sequence alignments, provide an intuitive implementation of a similarity operator for the sequence pattern structure based on directed acyclic graphs. Finally, we provide an experimental evaluation of our approach in comparison with state-of-the-art closed sequence mining algorithms showing that our approach can largely outperform them when dealing with large regions of the search space.
引用
收藏
页码:106 / 121
页数:16
相关论文
共 50 条
  • [1] Sequence Pattern Mining with Variables
    Okolica, James S.
    Peterson, Gilbert L.
    Mills, Robert F.
    Grimaila, Michael R.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (01) : 177 - 187
  • [2] Improved Multiple Sequence Alignments Using Coupled Pattern Mining
    Hossain, K. S. M. Tozammel
    Patnaik, Debprakash
    Laxman, Srivatsan
    Jain, Prateek
    Bailey-Kellogg, Chris
    Ramakrishnan, Naren
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2013, 10 (05) : 1098 - 1112
  • [3] A Proposition for Combining Pattern Structures and Relational Concept Analysis
    Codocedo, Victor
    Napoli, Amedeo
    FORMAL CONCEPT ANALYSIS, ICFCA 2014, 2014, 8478 : 96 - 111
  • [4] An incremental sequence pattern mining algorithm
    Fu, Zhongliang
    Chen, Nan
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2010, 35 (07): : 763 - 767
  • [5] Protein sequence pattern mining with constraints
    Ferreira, PG
    Azevedo, PJ
    KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 96 - 107
  • [6] Micro Sequence Identification of DNA Data Using Pattern Mining Techniques
    Surendar, A.
    Shaik, Sadulla
    Rani, N. Usha Rani
    MATERIALS TODAY-PROCEEDINGS, 2018, 5 (01) : 578 - 587
  • [7] Mining Frequent Pattern within a Genetic Sequence Using Unique Pattern Indexing and Mapping Techniques
    Mutakabbir, Kazi Mahbub
    Mahin, Shah S.
    Hasan, Md Abid
    2014 INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV), 2014,
  • [8] Demand-driven frequent itemset mining using pattern structures
    Haixun Wang
    Chang-Shing Perng
    Sheng Ma
    Philip S. Yu
    Knowledge and Information Systems, 2005, 8 : 82 - 102
  • [9] Demand-driven frequent itemset mining using pattern structures
    Wang, HX
    Perng, CS
    Ma, S
    Yu, PS
    KNOWLEDGE AND INFORMATION SYSTEMS, 2005, 8 (01) : 82 - 102
  • [10] Mining preserving structures in a graph sequence
    Uno, Takeaki
    Uno, Yushi
    THEORETICAL COMPUTER SCIENCE, 2016, 654 : 155 - 163