Efficient Discovery of Sequence Outlier Patterns

被引:11
|
作者
Cao, Lei [1 ]
Yan, Yizhou [2 ]
Madden, Samuel [1 ]
Rundensteiner, Elke A. [2 ]
Gopalsamy, Mathan [3 ]
机构
[1] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Worcester Polytech Inst, Worcester, MA 01609 USA
[3] Signify Res, Cambridge, MA USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2019年 / 12卷 / 08期
关键词
FREQUENT; ALGORITHMS;
D O I
10.14778/3324301.3324308
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern Internet of Things (IoT) applications generate massive amounts of Lime-stamped data, much of it in the form of discrete, symbolic sequences. In this work, we present a new system called TOP that deTects Outlier Patterns from these sequences. To solve the fundamental limitation of existing pattern mining semantics that miss outlier patterns hidden inside of larger frequent patterns, TOP offers new pattern semantics based on contextual patterns that distinguish the independent occurrence of a pattern from its occurrence as part of its super-pattern. We present efficient algorithms for the mining of this new class of contextual patterns. In particular, in contrast to the bottom-up strategy for state-of-the-art pattern mining techniques, our top-down Reduce strategy piggy backs pattern detection with the detection of the context in which a pattern occurs. Our approach achieves linear time complexity in the length of the input sequence. Effective optimization techniques such as context-driven search space pruning and inverted index-based outlier pattern detection are also proposed to further speed up contextual pattern mining. Our experimental evaluation demonstrates the effectiveness of TOP at capturing meaningful outlier patterns in several real-world IoT use cases. We also demonstrate the efficiency of TOP, showing it to be up to 2 orders of magnitude faster than adapting state-of-the-art mining to produce this new class of contextual outlier patterns, allowing us to scale outlier pattern mining to large sequence datasets.
引用
收藏
页码:920 / 932
页数:13
相关论文
共 50 条
  • [1] Efficient and Accurate Discovery of Patterns in Sequence Datasets
    Floratou, Avrilia
    Tata, Sandeep
    Patel, Jignesh M.
    26TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING ICDE 2010, 2010, : 461 - 472
  • [2] Efficient and Accurate Discovery of Patterns in Sequence Data Sets
    Floratou, Avrilia
    Tata, Sandeep
    Patel, Jignesh M.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (08) : 1154 - 1168
  • [3] A relative patterns discovery for enhancing outlier detection in categorical data
    Pai, Hao-Ting
    Wu, Fan
    Hsueh, Pei-Yun S.
    DECISION SUPPORT SYSTEMS, 2014, 67 : 90 - 99
  • [4] Sequence-structure patterns: Discovery and applications
    Milledge, T
    Khuri, S
    Wei, X
    Yang, C
    Zheng, G
    Narasimhan, G
    Proceedings of the 8th Joint Conference on Information Sciences, Vols 1-3, 2005, : 1282 - 1285
  • [5] Efficient discovery of frequent approximate sequential patterns
    Zhu, Feida
    Yan, Xifeng
    Han, Jiawei
    Yu, Philip S.
    ICDM 2007: PROCEEDINGS OF THE SEVENTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2007, : 751 - +
  • [6] Patterns Discovery for Efficient Structured Probabilistic Inference
    Torti, Lionel
    Gonzales, Christophe
    Wuillemin, Pierre-Henri
    SCALABLE UNCERTAINTY MANAGEMENT, 2011, 6929 : 247 - 260
  • [7] Efficient discovery of risk patterns in medical data
    Li, Jiuyong
    Fu, Ada Wai-chee
    Fahey, Paul
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2009, 45 (01) : 77 - 89
  • [8] Efficient discovery of unusual patterns in time series
    Lonardi S.
    Lin J.
    Keogh E.
    Chiu B.
    New Generation Computing, 2006, 25 (1) : 61 - 93
  • [9] Efficient discovery of unusual patterns in time series
    Lonardi, Stefano
    Lin, Jessica
    Keogh, Eamonn
    Chiu, Bill 'Yuan-chi'
    NEW GENERATION COMPUTING, 2007, 25 (01) : 61 - 93
  • [10] Discovery of diagnostic patterns from protein sequence databases
    Olsson, B
    Laurio, K
    PRINCIPLES OF DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 1510 : 167 - 175