Weighted Outlier Detection of High-Dimensional Categorical Data Using Feature Grouping

被引:20
|
作者
Li, Junli [1 ,2 ]
Zhang, Jifu [1 ]
Pang, Ning [1 ]
Qin, Xiao [3 ]
机构
[1] Taiyuan Univ Sci & Technol, Sch Comp Sci & Technol, Taiyuan 030024, Peoples R China
[2] Jinzhong Univ, Sch Informat Technol & Engn, Jinzhong 030619, Peoples R China
[3] Auburn Univ, Samuel Ginn Coll Engn, Dept Comp Sci & Software Engn, Auburn, AL 36849 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Feature extraction; Anomaly detection; Correlation; Machine learning algorithms; Clustering algorithms; Entropy; Categorical data; feature grouping; feature relation; feature weighting; outlier detection; PATTERN;
D O I
10.1109/TSMC.2018.2847625
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a weighted outlier mining method called WATCH to identify outliers in high-dimensional categorical datasets. WATCH is composed of two distinctive modules: 1) feature grouping by the virtue of correlation measurement among features and 2) outlier mining by assigning scores to objects in each feature groups. At the heart of WATCH is the feature grouping module, which groups an array of features into multiple groups to discover various aspects of feature patterns in each group. The outlier mining module detects outliers from high-dimensional categorical datasets. Except for the number of outliers specified by users, WATCH is conducive to bypassing the optimization of any user-given parameter. We implement and evaluate WATCH using synthetic and real-world datasets. Our experimental results show that WATCH is a promising and practical algorithm to detect outliers in high-dimensional categorical datasets, because WATCH achieves high performance in terms of precision, efficiency, and interpretability.
引用
收藏
页码:4295 / 4308
页数:14
相关论文
共 50 条
  • [1] The influence of feature grouping algorithm in outlier detection with categorical data
    Nathaniel, Sharon Femi Paul Sunder
    Alwarsamy, Kala
    Viswanathan, Rajalakshmi
    Subramanian, Ganesh Vaidyanathan
    Veerabahu, Vidhya
    [J]. ACTA SCIENTIARUM-TECHNOLOGY, 2024, 46 (01)
  • [2] Feature Grouping using Weighted l1 Norm for High-Dimensional Data
    Vinzamuri, Bhanukiran
    Padthe, Karthik K.
    Reddy, Chandan K.
    [J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 1233 - 1238
  • [3] Outlier detection for high-dimensional data
    Ro, Kwangil
    Zou, Changliang
    Wang, Zhaojun
    Yin, Guosheng
    [J]. BIOMETRIKA, 2015, 102 (03) : 589 - 599
  • [4] Feature Extraction for Outlier Detection in High-Dimensional Spaces
    Hoang Vu Nguyen
    Gopalkrishnan, Vivekanand
    [J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON FEATURE SELECTION IN DATA MINING, 2010, 10 : 66 - 75
  • [5] Intrinsic dimensional outlier detection in high-dimensional data
    Von Brünken, Jonathan
    Houle, Michael E.
    Zimek, Arthur
    [J]. NII Technical Reports, 2015, (03): : 1 - 12
  • [6] Efficient Outlier Detection for High-Dimensional Data
    Liu, Huawen
    Li, Xuelong
    Li, Jiuyong
    Zhang, Shichao
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461
  • [7] Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
    Koufakou, Anna
    Secretan, Jimmy
    Georgiopoulos, Michael
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 29 (03) : 697 - 725
  • [8] Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
    Anna Koufakou
    Jimmy Secretan
    Michael Georgiopoulos
    [J]. Knowledge and Information Systems, 2011, 29 : 697 - 725
  • [9] Feature grouping-based parallel outlier mining of categorical data using spark
    Li, Junli
    Zhang, Jifu
    Qin, Xiao
    Xun, Yaling
    [J]. INFORMATION SCIENCES, 2019, 504 : 1 - 19
  • [10] A geometric framework for outlier detection in high-dimensional data
    Herrmann, Moritz
    Pfisterer, Florian
    Scheipl, Fabian
    [J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (03)