Empirical study of outlier impact in classification context

被引:0
|
作者
Khan, Hufsa [1 ,2 ]
Rasheed, Muhammad Tahir [3 ]
Zhang, Shengli [1 ]
Wang, Xizhao [4 ,5 ]
Liu, Han [4 ,5 ]
机构
[1] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen 518060, Peoples R China
[2] Shenzhen Inst Comp Sci, Shenzhen 518000, Peoples R China
[3] Shenzhen Technol Univ, Coll Big data & Internet, Shenzhen 518118, Peoples R China
[4] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[5] Shenzhen Univ, Guangdong Prov Key Lab Intelligent Informat Proc, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
Outlier handling; Fussy c-means cluster; AdaBoost; Classification;
D O I
10.1016/j.eswa.2024.124953
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of data mining, outlier detection is an important and challenging task. This paper focuses on studying the impacts of outliers on the performance of models learned from large-scale datasets while it is unknown whether a data point is outlier or not. In the proposed approach, a fuzzy c-means clustering algorithm is applied and outliers are defined as those that have greater membership values in the cluster and are located further away from the cluster centroid. Ideally, the sample with a higher membership value should be located closer to the cluster centroid. In this context, we calculate the weight of each sample using the AdaBoost algorithm, where a weight determines the representativeness of each sample within the data distribution. Additionally, in this study, the impact of weighted loss functions in different situations are discussed in detail. At last, our method is evaluated on 12 UCI datasets and the accuracy of our method is greater than 95% on some datasets, such as banknote 99.99%, biodeg 99.01%, optdigits 99.19%, and letters 97.37%. The experimental results show the efficiency and effectiveness of the proposed approach.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Outlier Detection in Graphs: A Study on the Impact of Multiple Graph Models
    Campos, Guilherme Oliveira
    Moreira, Edre
    Meira, Wagner, Jr.
    Zimek, Arthur
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2019, 16 (02) : 565 - 595
  • [42] Prediction and outlier detection in classification problems
    Guan, Leying
    Tibshirani, Robert
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2022, 84 (02) : 524 - 546
  • [43] Ultrasonic Wire Bond Outlier Classification
    Seppänen, Henri
    Chua, Siang Tat
    Martinez, Joel Elizondo
    Villa, Pedro
    Advancing Microelectronics, 2022, 49 (04): : 12 - 15
  • [44] Outlier Detection: Methods, Models, and Classification
    Boukerche, Azzedine
    Zheng, Lining
    Alfandi, Omar
    ACM COMPUTING SURVEYS, 2020, 53 (03)
  • [45] Automated risk classification and outlier detection
    Iyer, Naresh
    Bonissone, Piero P.
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE IN MULTI-CRITERIA DECISION MAKING, 2007, : 272 - +
  • [46] Outlier Robust Gaussian Process Classification
    Kim, Hyun-Chul
    Ghahramani, Zoubin
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, 2008, 5342 : 896 - +
  • [47] Classification and outlier identification for the GAIA mission
    Hennig, C
    NEURAL NETWORK WORLD, 2005, 15 (04) : 335 - 342
  • [48] Supervised outlier detection for classification and regression
    Fernandez, Angela
    Bella, Juan
    Dorronsoro, Jose R.
    NEUROCOMPUTING, 2022, 486 : 77 - 92
  • [49] Empirical likelihood for outlier detection in regression models
    Baragona R.
    Battaglia F.
    Cucina D.
    Journal of Statistical Theory and Practice, 2018, 12 (2) : 255 - 281
  • [50] Improving Classification by Outlier Detection and Removal
    Sharma, Pankaj Kumar
    Haleem, Hammad
    Ahmad, Tanvir
    EMERGING ICT FOR BRIDGING THE FUTURE, VOL 2, 2015, 338 : 621 - 628