A Scalable Complex Event Analytical System with Incremental Episode Mining over Data Streams

被引:0
|
作者
Tseng, Jerry C. C. [1 ]
Gu, Jia-Yuan [1 ]
Tseng, Vincent S. [2 ]
Wang, P. F. [3 ]
Chen, Ching-Yu [3 ]
Li, Chu-Feng [3 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Tainan, Taiwan
[2] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
[3] Inst Informat Ind, Taipei, Taiwan
关键词
Data Stream; Incremental Mining; Episode Pattern Mining; Lambda Architecture; FREQUENT EPISODES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Episode pattern mining is a very powerful technique to get high-valued information for people to solve real-life cross-disciplinary problems, such as for the analysis of manufacturing, stock markets, weather records and so on. As data grows, the mining process must be re-triggered again and again to obtain the most updated information. However, periodically re-mining the full dataset is not cost-effective, and thus a number of incremental mining approaches arise for the growing data. However, to our best knowledge, there exist few studies targeted on the problem of incremental episode mining. Moreover, streaming data of complex events is more and more popular because digital sensors always collect data around us in this big data age. Now the challenge is not only mining valuable episode patterns of incremental dataset, but also mining episode patterns over data streams of complex events. To address this research problem, we adopt the Lambda Architecture to design a scalable complex event analytical system that could be used to facilitate the incremental episode mining process over complex event sequences of data streams. Apache Spark and Apache Spark Streaming are applied as the development framework of the batch layer and the speed layer, respectively. To take both the efficiency and accuracy into consideration, we develop a series of modules and three algorithms, namely, batch episode mining, delta episode mining and pattern merging. Results from the experimental validation on a real dataset show that the proposed system carries high scalability and delivers excellent performance in terms of efficiency and accuracy.
引用
收藏
页码:648 / 655
页数:8
相关论文
共 50 条
  • [41] Efficient Complex Event Processing over RFID Streams
    Bok, Kyoung Soo
    Yeo, Myung Ho
    Lee, Byoung Yeop
    Yoo, Jae Soo
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2012,
  • [42] An incremental fuzzy decision tree classification method for mining data streams
    Wang, Tao
    Li, Zhoujun
    Yan, Yuejin
    Chen, Huowang
    MACHINE LEARNING AND DATA MINING IN PATTERN RECOGNITION, PROCEEDINGS, 2007, 4571 : 91 - +
  • [43] SPAMS: A Novel Incremental Approach for Sequential Pattern Mining in Data Streams
    Vinceslas, Lionel
    Symphor, Jean-Emile
    Mancheron, Alban
    Poncelet, Pascal
    ADVANCES IN KNOWLEDGE DISCOVERY AND MANAGEMENT, 2010, 292 : 201 - 216
  • [44] Continuously matching episode rules for predicting future events over event streams
    Cho, Chung-Wen
    Zheng, Ying
    Chen, Arbee L. P.
    ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS, 2007, 4505 : 884 - +
  • [45] Multi-Context Incremental Reasoning over Data Streams
    Mebrek, Wafaa
    Bouzeghoub, Amel
    2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2023, : 150 - 157
  • [46] Online event recognition over noisy data streams
    Mantenoglou, Periklis
    Artikis, Alexander
    Paliouras, Georgios
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2023, 161
  • [47] Mining multidimensional sequential patterns over data streams
    Raissi, Chedy
    Plantevit, Marc
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2008, 5182 : 263 - 272
  • [48] Association Rules Mining over Data Streams: Review
    Tan, Jun
    ADVANCES IN CIVIL ENGINEERING II, PTS 1-4, 2013, 256-259 : 2890 - 2893
  • [49] AGAMI: Scalable Visual Analytics over Multidimensional Data Streams
    Lu, Mingxin
    Wong, Edmund
    Barajas, Daniel
    Li, Xiaochen
    Ogundipe, Mosopefoluwa
    Wilson, Nate
    Garg, Pragya
    Joshi, Alark
    Malensek, Matthew
    2020 IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES (BDCAT 2020), 2020, : 57 - 66
  • [50] SPQ: A scalable pattern query method over data streams
    Li F.-F.
    Li H.-Y.
    Qu Q.
    Miao G.-S.
    Jisuanji Xuebao/Chinese Journal of Computers, 2010, 33 (08): : 1481 - 1491