Empirical study of outlier impact in classification context

被引:0
|
作者
Khan, Hufsa [1 ,2 ]
Rasheed, Muhammad Tahir [3 ]
Zhang, Shengli [1 ]
Wang, Xizhao [4 ,5 ]
Liu, Han [4 ,5 ]
机构
[1] Shenzhen Univ, Coll Elect & Informat Engn, Shenzhen 518060, Peoples R China
[2] Shenzhen Inst Comp Sci, Shenzhen 518000, Peoples R China
[3] Shenzhen Technol Univ, Coll Big data & Internet, Shenzhen 518118, Peoples R China
[4] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[5] Shenzhen Univ, Guangdong Prov Key Lab Intelligent Informat Proc, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
Outlier handling; Fussy c-means cluster; AdaBoost; Classification;
D O I
10.1016/j.eswa.2024.124953
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of data mining, outlier detection is an important and challenging task. This paper focuses on studying the impacts of outliers on the performance of models learned from large-scale datasets while it is unknown whether a data point is outlier or not. In the proposed approach, a fuzzy c-means clustering algorithm is applied and outliers are defined as those that have greater membership values in the cluster and are located further away from the cluster centroid. Ideally, the sample with a higher membership value should be located closer to the cluster centroid. In this context, we calculate the weight of each sample using the AdaBoost algorithm, where a weight determines the representativeness of each sample within the data distribution. Additionally, in this study, the impact of weighted loss functions in different situations are discussed in detail. At last, our method is evaluated on 12 UCI datasets and the accuracy of our method is greater than 95% on some datasets, such as banknote 99.99%, biodeg 99.01%, optdigits 99.19%, and letters 97.37%. The experimental results show the efficiency and effectiveness of the proposed approach.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Impact of Dimensionality Reduction on Outlier Detection: an Empirical Study
    Vaidya, Vivek
    Vaidya, Jaideep
    2022 IEEE 4TH INTERNATIONAL CONFERENCE ON TRUST, PRIVACY AND SECURITY IN INTELLIGENT SYSTEMS, AND APPLICATIONS, TPS-ISA, 2022, : 150 - 159
  • [2] Empirical Study of the Associative Approach in the Context of Classification Problems
    Cleofas Sanchez, Laura
    Pineda Briseno, Anabel
    Valdovinos Rosas, Rosa Maria
    Sanchez Garreta, Jose Salvador
    Garcia Jimenez, Vicente
    Camacho Nieto, Oscar
    Perez Meana, Hector
    Nakano Miyatake, Mariko
    COMPUTACION Y SISTEMAS, 2019, 23 (02): : 601 - 617
  • [3] Outlier bias: AI classification of curb ramps, outliers, and context
    Deitz, Shiloh
    BIG DATA & SOCIETY, 2023, 10 (02)
  • [4] Effects of a Single Outlier on the Coefficient of Determination: An Empirical Study
    Fitrianto, Anwar
    Rana, Sohel
    Midi, Habshah
    Hydara, Kutub
    2ND ISM INTERNATIONAL STATISTICAL CONFERENCE 2014 (ISM-II): EMPOWERING THE APPLICATIONS OF STATISTICAL AND MATHEMATICAL SCIENCES, 2015, 1643 : 409 - 413
  • [5] The impact of teaching and research on hospital costsAn empirical study in the French context
    C. Huttin
    G. de Pouvourville
    The European Journal of Health Economics (HEPAC), 2001, 2 (2): : 47 - 53
  • [6] The Impact of Use Context on Mobile Payment Acceptance: An Empirical Study in China
    Wang, Luzhuang
    Yi, Yongzheng
    ADVANCES IN COMPUTER SCIENCE AND EDUCATION, 2012, 140 : 293 - +
  • [7] The impact of sales contests on customer listening: an empirical study in a telesales context
    Koehl, Maryse
    Poujol, Juliet F.
    Tanner, John F., Jr.
    JOURNAL OF PERSONAL SELLING & SALES MANAGEMENT, 2016, 36 (03) : 281 - 293
  • [8] On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
    Guilherme O. Campos
    Arthur Zimek
    Jörg Sander
    Ricardo J. G. B. Campello
    Barbora Micenková
    Erich Schubert
    Ira Assent
    Michael E. Houle
    Data Mining and Knowledge Discovery, 2016, 30 : 891 - 927
  • [9] On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study
    Campos, Guilherme O.
    Zimek, Arthur
    Sander, Jorg
    Campello, Ricardo J. G. B.
    Micenkova, Barbora
    Schubert, Erich
    Assent, Ira
    Houle, Michael E.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2016, 30 (04) : 891 - 927
  • [10] Examining the impact of factors on sustainable tourism practices: an empirical study in the Indian context
    Naik, Swati
    Chanda, Ruby S.
    COGENT SOCIAL SCIENCES, 2025, 11 (01):