Incomplete mixed data-driven outlier detection based on local-global neighborhood information

被引:9
|
作者
Li, Ran [1 ]
Chen, Hongchang [1 ,2 ]
Liu, Shuxin [1 ,2 ]
Li, Xing [1 ,2 ]
Li, Yingle [1 ,2 ]
Wang, Biao [1 ]
机构
[1] PLA Strateg Support Force Informat Engn Univ, Inst Informat Technol, Zhengzhou 450002, Henan, Peoples R China
[2] Natl Digital Switching Syst Engn & Technol R&D Ctr, Zhengzhou 450002, Henan, Peoples R China
关键词
Markov random walk; Incomplete mixed data; Graph-based outlier detection; Unsupervised attribute reduction; Local-global information mining; Neighborhood information entropy; ATTRIBUTE REDUCTION; NETWORK;
D O I
10.1016/j.ins.2023.03.037
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Outlier detection is a crucial task for identifying unexpected patterns, errors, and behaviors; therefore, maximizing the valuable information obtained from ubiquitous, incomplete, redundant, noisy, and mixed data poses a great challenge. To achieve efficient graph-based outlier detection, we enhance the connectivity between similar objects and weaken the connectivity between heterogeneous objects. The network structure proposed in this paper is called "an incomplete local and global neighborhood information (ILGNI) network." In this network, incomplete mixed data can be exploited considering two aspects; single-attribute local information and multi-attribute global information. Specifically, we initially utilize unsupervised attribute reduction methods to improve data quality. Then, from the perspective of local and global information, we use the level of similarity of objects to design strong-neighborhood and weak-similarity relations to deal with incomplete data. On this basis, the topology reconstruction of a high number of fine-grained neighborhood information networks can be realized. Finally, the outlier scores are calculated considering the stationary distribution of the Markov random walk model over the ILGNI network. The experiments conducted on four real-telecom fraud datasets demonstrate that the proposed algorithm can achieve enhanced outlier detection performance with low time complexity. In addition, the proposed method can effectively mine the information obtained from incomplete data and has high applicability to feature-related and feature-independent datasets.
引用
收藏
页码:204 / 225
页数:22
相关论文
共 50 条
  • [1] Hybrid data-driven outlier detection based on neighborhood information entropy and its developmental measures
    Yuan, Zhong
    Zhang, Xianyong
    Feng, Shan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 112 : 243 - 257
  • [2] Outlier detection based on multisource information fusion in incomplete mixed data
    Li, Ran
    Chen, Hongchang
    Liu, Shuxin
    Wang, Kai
    Liu, Shuo
    Su, Zhe
    [J]. APPLIED SOFT COMPUTING, 2024, 165
  • [3] Outlier Detection of Mixed Data Based on Neighborhood Combinatorial Entropy
    Wang, Lina
    Zhang, Qixiang
    Niu, Xiling
    Ren, Yongjun
    Xia, Jinyue
    [J]. CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (02): : 1765 - 1781
  • [4] Outlier detection on uncertain data based on local information
    Liu, Jing
    Deng, HuiFang
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 51 : 60 - 71
  • [5] Attribute reduction for incomplete mixed data based on neighborhood information system
    Li, Ran
    Chen, Hongchang
    Liu, Shuxin
    Jiang, Haocong
    Wang, Biao
    [J]. INTERNATIONAL JOURNAL OF GENERAL SYSTEMS, 2024, 53 (02) : 127 - 153
  • [6] Data-Driven Event Detection of Power Systems Based on Unequal-Interval Reduction of PMU Data and Local Outlier Factor
    Liu, Shengyuan
    Zhao, Yuxuan
    Lin, Zhenzhi
    Liu, Yilu
    Ding, Yi
    Yang, Li
    Yi, Shimin
    [J]. IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (02) : 1630 - 1643
  • [7] Data-Driven Pattern Identification and Outlier Detection in Time Series
    Khoshrou, Abdolrahman
    Pauwels, Eric J.
    [J]. INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 471 - 484
  • [8] Neighborhood relevant outlier detection approach based on information entropy
    Yu, Qingying
    Luo, Yonglong
    Chen, Chuanming
    Bian, Weixin
    [J]. INTELLIGENT DATA ANALYSIS, 2016, 20 (06) : 1247 - 1265
  • [9] Time-on-Task Estimation by data-driven Outlier Detection based on Learning Activities
    Rotelli, Daniela
    Monreale, Anna
    [J]. LAK22 CONFERENCE PROCEEDINGS: THE TWELFTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, 2022, : 336 - 346
  • [10] Outlier detection approach based on local outlier factor for datasets with mixed attributes
    [J]. Cho, Nam-Wook (nwcho@seoultech.ac.kr), 2016, ICIC Express Letters Office (07):