Efficient Discovery of Sequence Outlier Patterns

被引:11
|
作者
Cao, Lei [1 ]
Yan, Yizhou [2 ]
Madden, Samuel [1 ]
Rundensteiner, Elke A. [2 ]
Gopalsamy, Mathan [3 ]
机构
[1] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] Worcester Polytech Inst, Worcester, MA 01609 USA
[3] Signify Res, Cambridge, MA USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2019年 / 12卷 / 08期
关键词
FREQUENT; ALGORITHMS;
D O I
10.14778/3324301.3324308
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Modern Internet of Things (IoT) applications generate massive amounts of Lime-stamped data, much of it in the form of discrete, symbolic sequences. In this work, we present a new system called TOP that deTects Outlier Patterns from these sequences. To solve the fundamental limitation of existing pattern mining semantics that miss outlier patterns hidden inside of larger frequent patterns, TOP offers new pattern semantics based on contextual patterns that distinguish the independent occurrence of a pattern from its occurrence as part of its super-pattern. We present efficient algorithms for the mining of this new class of contextual patterns. In particular, in contrast to the bottom-up strategy for state-of-the-art pattern mining techniques, our top-down Reduce strategy piggy backs pattern detection with the detection of the context in which a pattern occurs. Our approach achieves linear time complexity in the length of the input sequence. Effective optimization techniques such as context-driven search space pruning and inverted index-based outlier pattern detection are also proposed to further speed up contextual pattern mining. Our experimental evaluation demonstrates the effectiveness of TOP at capturing meaningful outlier patterns in several real-world IoT use cases. We also demonstrate the efficiency of TOP, showing it to be up to 2 orders of magnitude faster than adapting state-of-the-art mining to produce this new class of contextual outlier patterns, allowing us to scale outlier pattern mining to large sequence datasets.
引用
收藏
页码:920 / 932
页数:13
相关论文
共 50 条
  • [41] Efficient Discovery of Partial Periodic-Frequent Patterns in Temporal Databases
    Nakamura, So
    Kiran, R. Uday
    Likhitha, P.
    Ravikumar, P.
    Watanobe, Yutaka
    Dao, Minh Son
    Zettsu, Koji
    Toyoda, Masashi
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2021, PT I, 2021, 12923 : 221 - 227
  • [42] Methods for the Efficient Discovery of Large Item-Indexable Sequential Patterns
    Henriques, Rui
    Antunes, Claudia
    Madeira, Sara C.
    NEW FRONTIERS IN MINING COMPLEX PATTERNS, NFMCP 2013, 2014, 8399 : 100 - 116
  • [43] Efficient Discovery of Compact Maximal Behavioral Patterns from Event Logs
    Acheli, Mehdi
    Grigori, Daniela
    Weidlich, Matthias
    ADVANCED INFORMATION SYSTEMS ENGINEERING (CAISE 2019), 2019, 11483 : 579 - 594
  • [44] Towards Efficient Discovery of Partial Periodic Patterns in Columnar Temporal Databases
    Ravikumar, Penugonda
    Raj, Venus Vikranth
    Likhitha, Palla
    Kiran, Rage Uday
    Watanobe, Yutaka
    Ito, Sadanori
    Zettsu, Koji
    Toyoda, Masashi
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT II, 2022, 13758 : 141 - 154
  • [45] Efficient Discovery of Top-K Minimal Jumping Emerging Patterns
    Terlecki, Pawel
    Walczak, Krzysztof
    ROUGH SETS AND CURRENT TRENDS IN COMPUTING, PROCEEDINGS, 2008, 5306 : 438 - 447
  • [46] Efficient Discovery of Periodic-Frequent Patterns in Columnar Temporal Databases
    Ravikumar, Penugonda
    Likhitha, Palla
    Raj, Bathala Venus Vikranth
    Kiran, Rage Uday
    Watanobe, Yutaka
    Zettsu, Koji
    ELECTRONICS, 2021, 10 (12)
  • [47] Efficient discovery of periodic-frequent patterns in very large databases
    Kiran, R. Uday
    Kitsuregawa, Masaru
    Reddy, P. Krishna
    JOURNAL OF SYSTEMS AND SOFTWARE, 2016, 112 : 110 - 121
  • [48] SAODR: sequence analysis for outlier data rejection
    Pavese, F
    Ichim, D
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2004, 15 (10) : 2047 - 2052
  • [49] An efficient histogram method for outlier detection
    Gebski, Matthew
    Wong, Raymond K.
    ADVANCES IN DATABASES: CONCEPTS, SYSTEMS AND APPLICATIONS, 2007, 4443 : 176 - +
  • [50] An Efficient Model for Mining Outlier Opinions
    Hassan, Neama
    Abd-Elmegid, Laila A.
    Helmy, Yehia K.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 146 - 153