Projected outlier detection in high-dimensional mixed-attributes data set

被引:25
|
作者
Ye, Mao [1 ]
Li, Xue [2 ]
Orlowska, Maria E. [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Engn & Comp Sci, Chengdu 610054, Peoples R China
[2] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld 4072, Australia
[3] Fac Informat Technol, Polish Japanese Inst Informat Technol, PL-02008 Warsaw, Poland
基金
中国国家自然科学基金;
关键词
Outlier detection; Data mining; High-dimensional spaces; Mixed-attribute data sets; ALGORITHM;
D O I
10.1016/j.eswa.2008.08.030
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting outlier efficiently is an active research issue in data mining, which has important applications in the field of fraud detection, network intrusion detection, monitoring criminal activities in electronic commerce, etc. Because of the sparsity of high dimensional data, it is reasonable and meaningful to detect the outliers in suitable projected subspaces. We call such subspace and outliers in the subspace as anomaly subspace and projected outlier respectively. Many efficient algorithms have already been proposed for outlier detection based on different approaches, but there are few literatures on projected outlier detection for high dimensional data sets with mixed continuous and categorical attributes. In this paper, a novel projected outlier detection algorithm is proposed to detect projected outliers in high-dimensional mixed attribute data set. Our main contributions are: (1) combined with information entropy, a novel measure of anomaly subspace is proposed. In this anomaly subspace, meaningful outliers could be detected and explained. Unlike the previous projected outlier detection methods, the dimension of anomaly subspace is not decided beforehand; (2) theoretical analysis about this measure is presented; (3) bottom-up method is proposed to find the interesting anomaly subspaces; (4) the outlying degree of projected outlier is defined, which has good explanations; (5) the data set with mixed data type is handled; (6) experiments on synthetic and real data sets to evaluate the effectiveness of our approach are performed. (c) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:7104 / 7113
页数:10
相关论文
共 50 条
  • [41] A LoOP based outlier detection method for high dimensional fuzzy data set
    Jahromi, Alireza Fakharzadeh
    Zarei, Fateme
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2017, 32 (01) : 241 - 248
  • [42] Detecting Projected Outliers in High-Dimensional Data Streams
    Zhang, Ji
    Gao, Qigang
    Wang, Hai
    Liu, Qing
    Xu, Kai
    [J]. DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2009, 5690 : 629 - +
  • [43] A NOVEL TENSOR ALGEBRAIC APPROACH FOR HIGH-DIMENSIONAL OUTLIER DETECTION UNDER DATA MISALIGNMENT
    Fan, Bo
    Zhang, Zemin
    Aeron, Shuchin
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2016, : 3628 - 3632
  • [44] Generalized projected clustering in high-dimensional data streams
    Wang, T
    [J]. FRONTIERS OF WWW RESEARCH AND DEVELOPMENT - APWEB 2006, PROCEEDINGS, 2006, 3841 : 772 - 778
  • [45] PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data
    Mejia, Amanda F.
    Nebel, Mary Beth
    Eloyan, Ani
    Caffo, Brian
    Lindquist, Martin A.
    [J]. BIOSTATISTICS, 2017, 18 (03) : 521 - 536
  • [46] A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data
    Messaoud, Thouraya Aouled
    Smiti, Abir
    Louati, Aymen
    [J]. HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 : 322 - 331
  • [47] Support high-order tensor data description for outlier detection in high-dimensional big sensor data
    Deng, Xiaowu
    Jiang, Peng
    Peng, Xiaoning
    Mi, Chunqiao
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 81 : 177 - 187
  • [48] A High-dimensional Outlier Detection Algorithm Base on Relevant Subspace
    Gao, Zhipeng
    Zhao, Yang
    Niu, Kun
    Fan, Yidan
    [J]. 2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI, 2017, : 1001 - 1008
  • [49] Outlier Detection Using Structural Scores in a High-Dimensional Space
    Li, Xiaojie
    Lv, Jiancheng
    Yi, Zhang
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (05) : 2302 - 2310
  • [50] CELOF: Effective and fast memory efficient local outlier detection in high-dimensional data streams
    Chen, Liang
    Wang, Wei
    Yang, Yun
    [J]. Applied Soft Computing, 2021, 102