Empirical study of outlier impact in classification context

被引:0
|
作者
Khan, Hufsa [1 ,2 ]
Rasheed, Muhammad Tahir [3 ]
Zhang, Shengli [1 ]
Wang, Xizhao [4 ,5 ]
Liu, Han [4 ,5 ]
机构
[1] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen 518060, Peoples R China
[2] Shenzhen Inst Comp Sci, Shenzhen 518000, Peoples R China
[3] Shenzhen Technol Univ, Coll Big data & Internet, Shenzhen 518118, Peoples R China
[4] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[5] Shenzhen Univ, Guangdong Prov Key Lab Intelligent Informat Proc, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
Outlier handling; Fussy c-means cluster; AdaBoost; Classification;
D O I
10.1016/j.eswa.2024.124953
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of data mining, outlier detection is an important and challenging task. This paper focuses on studying the impacts of outliers on the performance of models learned from large-scale datasets while it is unknown whether a data point is outlier or not. In the proposed approach, a fuzzy c-means clustering algorithm is applied and outliers are defined as those that have greater membership values in the cluster and are located further away from the cluster centroid. Ideally, the sample with a higher membership value should be located closer to the cluster centroid. In this context, we calculate the weight of each sample using the AdaBoost algorithm, where a weight determines the representativeness of each sample within the data distribution. Additionally, in this study, the impact of weighted loss functions in different situations are discussed in detail. At last, our method is evaluated on 12 UCI datasets and the accuracy of our method is greater than 95% on some datasets, such as banknote 99.99%, biodeg 99.01%, optdigits 99.19%, and letters 97.37%. The experimental results show the efficiency and effectiveness of the proposed approach.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Empirical Study on Microsoft Malware Classification
    Chivukula, Rohit
    Sajja, Mohan Vamsi
    Lakshmi, T. Jaya
    Harini, Muddana
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (03) : 509 - 515
  • [32] An empirical study of the classification of eating disorders
    Bulik, CM
    Sullivan, PF
    Kendler, KS
    AMERICAN JOURNAL OF PSYCHIATRY, 2000, 157 (06): : 886 - 895
  • [33] An empirical study of automatic accent classification
    Choueiter, Ghinwa
    Zweig, Geoffrey
    Nguyen, Patrick
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4265 - +
  • [34] STUDY ON THE EMPIRICAL CLASSIFICATION OF ABSTINENT ALCOHOLICS
    KLAGES, U
    ZEITSCHRIFT FUR KLINISCHE PSYCHOLOGIE-FORSCHUNG UND PRAXIS, 1986, 15 (02): : 148 - 157
  • [35] Pair-wise range image registration: A study in outlier classification
    Dalley, G
    Flynn, P
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2002, 87 (1-3) : 104 - 115
  • [36] The impact of audit committee characteristics on the enhancement of the quality of financial reporting:: an empirical study in the Spanish context
    Pucheta-Martinez, Maria Consuelo
    de Fuentes, Cristina
    CORPORATE GOVERNANCE-AN INTERNATIONAL REVIEW, 2007, 15 (06) : 1394 - 1412
  • [37] An empirical comparison of feature reduction methods in the context of microarray data classification
    Kestler, Hans A.
    Muessel, Christoph
    ARTIFICIAL NEURAL NETWORKS IN PATTERN RECOGNITION, PROCEEDINGS, 2006, 4087 : 260 - 273
  • [38] An Empirical Study of Build Failures in the Docker Context
    Wu, Yiwen
    Zhang, Yang
    Wang, Tao
    Wang, Huaimin
    2020 IEEE/ACM 17TH INTERNATIONAL CONFERENCE ON MINING SOFTWARE REPOSITORIES, MSR, 2020, : 76 - 80
  • [39] Leadership in a Knowledge Management Context: An Empirical Study
    Al Nabt, Saeed
    Renukappa, Suresh
    Suresh, Subashini
    Algahtani, Khaled
    Sarrakh, Redouane
    PROCEEDINGS OF THE 19TH EUROPEAN CONFERENCE ON KNOWLEDGE MANAGEMENT (ECKM 2018), VOLS 1 AND 2, 2018, : 25 - 32
  • [40] THE EXISTENTIAL CONTEXT OF LOVESTYLES - AN EMPIRICAL-STUDY
    PRASINOS, S
    TITTLER, BI
    JOURNAL OF HUMANISTIC PSYCHOLOGY, 1984, 24 (01) : 95 - 112