Effect of inconsistency rate of granulated datasets on classification performance: An experimental approach

被引:1
|
作者
Wu, ChienHsing [1 ]
机构
[1] Natl Univ Kaohsiung, Dept Informat Management, 700 Kaohsiung Univ Rd, Kaohsiung 81148, Taiwan
关键词
Knowledge discovery; Granulation; Data inconsistency; Prediction accuracy; LABEL NOISE; DISCRETIZATION; ALGORITHM;
D O I
10.1016/j.ins.2022.11.135
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An experiment was conducted to investigate the effect of the inconsistency rate (IR) of granulated datasets on classification performance. Unsupervised (equal-width interval, EWI) and supervised (minimum description length, MDL) techniques were used to granu-late 36 datasets. An algorithm was developed to divide the original granulated datasets into consistent and inconsistent subsets. Five classifiers including one simple tree-based and four ensemble-based on datasets before granulation (BG), after granulation but before removal of inconsistent granulated datasets (AGBR), and after removal of inconsistent granulated datasets (AR) were used, followed by testing and comparisons of predication accuracy (PA). The experimental results showed the following: (1) 24 out of 36 via EWI and 28 out of 36 via MDL datasets contain inconsistent datasets. (2) PA of AR is more likely higher than of BG and AGBR datasets with both EWI and MDL by all classifiers. (3) Mean PA improvement ranges from 5.74% to 10.01% with EWI and from 8.74% to 13.73% with MDL. (4) The correlation coefficient between IR and PA improvement ranges from 0.7413 to 0.7901 with EWI and 0.7870 to 0.9683 with MDL.These results demonstrate the value of uncovering the effect of IR on classification performance in the domain of machine learning.(c) 2022 Elsevier Inc. All rights reserved.
引用
收藏
页码:357 / 373
页数:17
相关论文
共 50 条
  • [1] The effect of rebalancing techniques on the classification performance in cyberbullying datasets
    Marwa Khairy
    Tarek M. Mahmoud
    Tarek Abd-El-Hafeez
    Neural Computing and Applications, 2024, 36 : 1049 - 1065
  • [2] The effect of rebalancing techniques on the classification performance in cyberbullying datasets
    Khairy, Marwa
    Mahmoud, Tarek M.
    Abd-El-Hafeez, Tarek
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (03): : 1049 - 1065
  • [3] Addressing Performance Inconsistency in Domain Generalization for Image Classification
    Stirling, Jamie
    Al Moubayed, Noura
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [4] A Novel Approach for Complex Datasets Clustering/Classification
    Chang, Ting-Cheng
    Wang, Hui
    Yu, Suyi
    JOURNAL OF INTERNET TECHNOLOGY, 2016, 17 (03): : 523 - 530
  • [5] Performance Comparison of Classification Algorithms on Medical Datasets
    Ramana, Bendi Venkata
    Boddu, Raja Sarath Kumar
    2019 IEEE 9TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2019, : 140 - 145
  • [6] Noise in Datasets: What Are the Impacts on Classification Performance?
    Hasan, Rashida
    Chu, Cheehung Henry
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION APPLICATIONS AND METHODS (ICPRAM), 2021, : 163 - 170
  • [7] Predicting Classification Performance for Benchmark Hyperspectral Datasets
    Zhao, Bin
    Ragnarsson, Haukur Isfeld
    Ulfarsson, Magnus O.
    Cavallaro, Gabriele
    Benediktsson, Jon Atli
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 4180 - 4193
  • [8] Maximizing Classification Performance for Patient Response Datasets
    Dittman, David J.
    Khoshgoftaar, Taghi M.
    Wald, Randall
    Napolitano, Amri
    2013 IEEE 25TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2013, : 454 - 462
  • [9] Behavior classification of goats using 9-axis multi sensors: The effect of imbalanced datasets on classification performance
    Sakai, Koki
    Oishi, Kazato
    Miwa, Masafumi
    Kumagai, Hajime
    Hirooka, Hiroyuki
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 166
  • [10] Geographic and cartographic inconsistency factors among different cropland classification datasets: A field validation case in Cambodia
    Kang, Junmei
    Wang, Jun
    Zhong, Mianqing
    OPEN GEOSCIENCES, 2022, 14 (01) : 966 - 984