WMFP-Outlier: An Efficient Maximal Frequent-Pattern-Based Outlier Detection Approach for Weighted Data Streams

被引:10
|
作者
Cai, Saihua [1 ]
Li, Qian [1 ]
Li, Sicong [1 ]
Yuan, Gang [1 ]
Sun, Ruizhi [1 ,2 ]
机构
[1] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China
[2] Minist Agr, Sci Res Base Integrated Technol Precis Agr Anim H, Beijing 100083, Peoples R China
来源
INFORMATION TECHNOLOGY AND CONTROL | 2019年 / 48卷 / 04期
关键词
outlier detection; weighted maximal frequent-pattern mining; weighted data stream; deviation indices; data mining;
D O I
10.5755/j01.itc.48.4.22176
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Since outliers are the major factors that affect accuracy in data science, many outlier detection approaches have been proposed for effectively identifying the implicit outliers from static datasets, thereby improving the reliability of the data. In recent years, data streams have been the main form of data, and the data elements in a data stream are not always of equal importance. However, the existing outlier detection approaches do not consider the weight conditions; hence, these methods are not suitable for processing weighted data streams. In addition, the traditional pattern-based outlier detection approaches incur a high time cost in the outlier detection phase. Aiming at overcoming these problems, this paper proposes a two-phase pattern-based outlier detection approach, namely, WMFP-Outlier, for effectively detecting the implicit outliers from a weighted data stream, in which the maximal frequent patterns are used instead of the frequent patterns to accelerate the process of outlier detection. In the process of maximal frequent-pattern mining, the anti-monotonicity property and MFP-array structure are used to accelerate the mining operation. In the process of outlier detection, three deviation indices are designed for measuring the degree of abnormality of each transaction, and the transactions with the highest degrees of abnormality are judged as outliers. Last, several experimental studies are conducted on a synthetic dataset to evaluate the performance of the proposed WMFP-Outlier approach. The results demonstrate that the accuracy of the WMFP-Outlier approach is higher compared to the existing pattern-based outlier detection approaches, and the time cost of the outlier detection phase of WMFP-Outlier is lower than those of the other four compared pattern-based outlier detection approaches.
引用
收藏
页码:505 / 521
页数:17
相关论文
共 50 条
  • [31] KDE based outlier detection on distributed data streams in multimedia network
    Zhigao Zheng
    Hwa-Young Jeong
    Tao Huang
    Jiangbo Shu
    Multimedia Tools and Applications, 2017, 76 : 18027 - 18045
  • [32] Data Streams Oriented Outlier Detection Method: A Fast Minimal Infrequent Pattern Mining
    Zhou, ZhongYu
    Pi, DeChang
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (06) : 864 - 870
  • [33] An Efficient Outlier Detection Approach for Streaming Sensor Data Based on Neighbor Difference and Clustering
    Cai, Saihua
    Chen, Jinfu
    Yin, Baoquan
    Sun, Ruizhi
    Zhang, Chi
    Chen, Haibo
    Chen, Jingyi
    Lin, Min
    SECURITY AND COMMUNICATION NETWORKS, 2022, 2022
  • [34] IPMOD: An efficient outlier detection model for high-dimensional medical data streams
    Yang, Yun
    Fan, ChongJun
    Chen, Liang
    Xiong, HongLin
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
  • [35] An efficient approach for outlier detection in big sensor data of health care
    Saneja, Bharti
    Rani, Rinkle
    INTERNATIONAL JOURNAL OF COMMUNICATION SYSTEMS, 2017, 30 (17)
  • [36] Fractal-Based Outlier Detection Algorithm over RFID Data Streams
    Li, Liansheng
    INTERNATIONAL JOURNAL OF ONLINE ENGINEERING, 2016, 12 (01) : 35 - 41
  • [37] An Effective Ensemble-based Framework for Outlier Detection in Evolving Data Streams
    Hassan, Asmaa F.
    Barakat, Sherif
    Rezk, Amira
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (11) : 315 - 329
  • [38] Continuous kernel-based outlier detection over distributed data streams
    Su, Liang
    Han, Weihong
    Zou, Peng
    Jia, Yan
    FRONTIERS OF HIGH PERFORMANCE COMPUTING AND NETWORKING - ISPA 2007 WORKSHOPS, 2007, 4743 : 305 - +
  • [39] Real-Time Distance-Based Outlier Detection in Data Streams
    Tran, Luan
    Mun, Min Y.
    Shahabi, Cyrus
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (02): : 141 - 153
  • [40] Visual analysis system for association-based outlier detection for data streams
    Shi, Xiaochen
    Cai, Saihua
    Li, Sicong
    Sun, Ruizhi
    PROCEEDINGS OF THE 2021 IEEE 24TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN (CSCWD), 2021, : 232 - 237