Fast distributed outlier detection in mixed-attribute data sets

被引:116
|
作者
Otey, ME [1 ]
Ghoting, A [1 ]
Parthasarathy, S [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
outlier detection; anomaly detection; distributed data mining; mining dynamic data; mixedattribute data sets; data streams;
D O I
10.1007/s10618-005-0014-6
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficiently detecting outliers or anomalies is an important problem in many areas of science, medicine and information technology. Applications range from data cleaning to clinical diagnosis, from detecting anomalous defects in materials to fraud and intrusion detection. Over the past decade, researchers in data mining and statistics have addressed the problem of outlier detection using both parametric and non-parametric approaches in a centralized setting. However, there are still several challenges that must be addressed. First, most approaches to date have focused on detecting outliers in a continuous attribute space. However, almost all real-world data sets contain a mixture of categorical and continuous attributes. Categorical attributes are typically ignored or incorrectly modeled by existing approaches, resulting in a significant loss of information. Second, there have not been any general-purpose distributed outlier detection algorithms. Most distributed detection algorithms are designed with a specific domain (e.g. sensor networks) in mind. Third, the data sets being analyzed may be streaming or otherwise dynamic in nature. Such data sets are prone to concept drift, and models of the data must be dynamic as well. To address these challenges, we present a tunable algorithm for distributed outlier detection in dynamic mixed-attribute data sets.
引用
收藏
页码:203 / 228
页数:26
相关论文
共 50 条
  • [21] An online learning algorithm for a neuro-fuzzy classifier with mixed-attribute data
    Khuat, Thanh Tung
    Gabrys, Bogdan
    [J]. APPLIED SOFT COMPUTING, 2023, 137
  • [22] Fast outlier detection using rough sets theory
    Shaari, F.
    Bakar, A. A.
    Hamdan, A. R.
    [J]. DATA MINING IX: DATA MINING, PROTECTION, DETECTION AND OTHER SECURITY TECHNOLOGIES, 2008, 40 : 25 - 34
  • [23] Distributed Local Outlier Detection in Big Data
    Yan, Yizhou
    Cao, Lei
    Kuhlman, Caitlin
    Rundensteiner, Elke
    [J]. KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 1225 - 1234
  • [24] An Efficient Mixed Attribute Outlier Detection Method for Identifying Network Intrusions
    Beulah, J. Rene
    Punithavathani, D. Shalini
    [J]. INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2020, 14 (03) : 115 - 133
  • [25] A Novel Mixed-Attribute Fusion-Based Naive Bayesian Classifier
    Ou, Guiliang
    He, Yulin
    Fournier-Viger, Philippe
    Huang, Joshua Zhexue
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (20):
  • [26] A mixture model framework for class discovery and outlier detection in mixed labeled/unlabeled data sets
    Miller, DJ
    Browning, J
    [J]. 2003 IEEE XIII WORKSHOP ON NEURAL NETWORKS FOR SIGNAL PROCESSING - NNSP'03, 2003, : 489 - 498
  • [27] A Hybrid Method to Measure Distribution Consistency of Mixed-Attribute Datasets
    He, Yulin
    Ye, Xuan
    Huang, Defa
    Fournier-Viger, Philippe
    Huang, Joshua Zhexue
    [J]. IEEE Transactions on Artificial Intelligence, 2023, 4 (01): : 182 - 196
  • [28] Distributed outlier detection in hierarchically structured datasets with mixed attributes
    Liang, Qiao
    Wang, Kaibo
    [J]. QUALITY TECHNOLOGY AND QUANTITATIVE MANAGEMENT, 2020, 17 (03): : 337 - 353
  • [29] Continuous adaptive outlier detection on distributed data streams
    Su, Liang
    Han, Weihong
    Yang, Shuqiang
    Zou, Peng
    Jia, Yan
    [J]. HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS, PROCEEDINGS, 2007, 4782 : 74 - 85
  • [30] Improved global-best particle swarm optimization algorithm with mixed-attribute data classification capability
    Nouaouria, Nabila
    Boukadoum, Mounir
    [J]. APPLIED SOFT COMPUTING, 2014, 21 : 554 - 567