Weighted Outlier Detection of High-Dimensional Categorical Data Using Feature Grouping

被引：20

作者：

Li, Junli ^{[1
,2
]}

Zhang, Jifu ^{[1
]}

Pang, Ning ^{[1
]}

Qin, Xiao ^{[3
]}

机构：

[1] Taiyuan Univ Sci & Technol, Sch Comp Sci & Technol, Taiyuan 030024, Peoples R China

[2] Jinzhong Univ, Sch Informat Technol & Engn, Jinzhong 030619, Peoples R China

[3] Auburn Univ, Samuel Ginn Coll Engn, Dept Comp Sci & Software Engn, Auburn, AL 36849 USA

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2020年 / 50卷 / 11期

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Feature extraction; Anomaly detection; Correlation; Machine learning algorithms; Clustering algorithms; Entropy; Categorical data; feature grouping; feature relation; feature weighting; outlier detection; PATTERN;

D O I：

10.1109/TSMC.2018.2847625

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a weighted outlier mining method called WATCH to identify outliers in high-dimensional categorical datasets. WATCH is composed of two distinctive modules: 1) feature grouping by the virtue of correlation measurement among features and 2) outlier mining by assigning scores to objects in each feature groups. At the heart of WATCH is the feature grouping module, which groups an array of features into multiple groups to discover various aspects of feature patterns in each group. The outlier mining module detects outliers from high-dimensional categorical datasets. Except for the number of outliers specified by users, WATCH is conducive to bypassing the optimization of any user-given parameter. We implement and evaluate WATCH using synthetic and real-world datasets. Our experimental results show that WATCH is a promising and practical algorithm to detect outliers in high-dimensional categorical datasets, because WATCH achieves high performance in terms of precision, efficiency, and interpretability.

引用

页码：4295 / 4308

页数：14

共 50 条

[1] The influence of feature grouping algorithm in outlier detection with categorical data
Nathaniel, Sharon Femi Paul Sunder
Alwarsamy, Kala
Viswanathan, Rajalakshmi
Subramanian, Ganesh Vaidyanathan
Veerabahu, Vidhya
[J]. ACTA SCIENTIARUM-TECHNOLOGY, 2024, 46 (01)
[2] Feature Grouping using Weighted l1 Norm for High-Dimensional Data
Vinzamuri, Bhanukiran
Padthe, Karthik K.
Reddy, Chandan K.
[J]. 2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2016, : 1233 - 1238
[3] Outlier detection for high-dimensional data
Ro, Kwangil
Zou, Changliang
Wang, Zhaojun
Yin, Guosheng
[J]. BIOMETRIKA, 2015, 102 (03) : 589 - 599
[4] Feature Extraction for Outlier Detection in High-Dimensional Spaces
Hoang Vu Nguyen
Gopalkrishnan, Vivekanand
[J]. PROCEEDINGS OF THE FOURTH INTERNATIONAL WORKSHOP ON FEATURE SELECTION IN DATA MINING, 2010, 10 : 66 - 75
[5] Intrinsic dimensional outlier detection in high-dimensional data
Von Brünken, Jonathan
Houle, Michael E.
Zimek, Arthur
[J]. NII Technical Reports, 2015, (03): : 1 - 12
[6] Efficient Outlier Detection for High-Dimensional Data
Liu, Huawen
Li, Xuelong
Li, Jiuyong
Zhang, Shichao
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2018, 48 (12): : 2451 - 2461
[7] Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
Koufakou, Anna
Secretan, Jimmy
Georgiopoulos, Michael
[J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 29 (03) : 697 - 725
[8] Non-derivable itemsets for fast outlier detection in large high-dimensional categorical data
Anna Koufakou
Jimmy Secretan
Michael Georgiopoulos
[J]. Knowledge and Information Systems, 2011, 29 : 697 - 725
[9] Feature grouping-based parallel outlier mining of categorical data using spark
Li, Junli
Zhang, Jifu
Qin, Xiao
Xun, Yaling
[J]. INFORMATION SCIENCES, 2019, 504 : 1 - 19
[10] A geometric framework for outlier detection in high-dimensional data
Herrmann, Moritz
Pfisterer, Florian
Scheipl, Fabian
[J]. WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2023, 13 (03)

← 1 2 3 4 5 →