Cluster-Based Instance Selection for the Imbalanced Data Classification

被引:5
|
作者
Czarnowski, Ireneusz [1 ]
Jedrzejowicz, Piotr [1 ]
机构
[1] Gdynia Maritime Univ, Dept Informat Syst, Morska 83, PL-81225 Gdynia, Poland
关键词
Instance selection; Clustering; Imbalanced data; Team of agents; INTEGRATION; REDUCTION;
D O I
10.1007/978-3-319-98446-9_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Instance selection, often referred to as data reduction, aims at deciding which instances from the training set should be retained for further use during the learning process. Instance selection is the important preprocessing step for many machine leaning tools, especially when the huge data sets are considered. Class imbalance arises, when the number of examples belonging to one class is much greater than the number of examples belonging to another. The paper proposes a cluster-based instance selection approach for the imbalanced data classification. The proposed approach bases on the similarity coefficient between training data instances, calculated for each considered data class independently. Similar instances are grouped into clusters. Next, the instance selection is carried out. The process of instance selection is controlled and carried-out by the team of agents. The proposed approach is validated experimentally. Advantages and main features of the approach are discussed considering results of the computational experiment.
引用
收藏
页码:191 / 200
页数:10
相关论文
共 50 条
  • [1] Cluster-based instance selection for machine classification
    Czarnowski, Ireneusz
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 30 (01) : 113 - 133
  • [2] Cluster-based instance selection for machine classification
    Ireneusz Czarnowski
    [J]. Knowledge and Information Systems, 2012, 30 : 113 - 133
  • [3] Cluster Integration for the Cluster-Based Instance Selection
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT I, 2010, 6421 : 353 - 362
  • [4] A cluster-based hybrid sampling approach for imbalanced data classification
    Feng, Shou
    Zhao, Chunhui
    Fu, Ping
    [J]. REVIEW OF SCIENTIFIC INSTRUMENTS, 2020, 91 (05):
  • [5] A New Cluster-based Instance Selection Algorithm
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    [J]. AGENT AND MULTI-AGENT SYSTEMS: TECHNOLOGIES AND APPLICATIONS, 2011, 6682 : 436 - 445
  • [6] An Approach to Imbalanced Data Classification Based on Instance Selection and Over-Sampling
    Czarnowski, Ireneusz
    Jedrzejowicz, Piotr
    [J]. COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT I, 2019, 11683 : 601 - 610
  • [7] Cluster-based sampling of multiclass imbalanced data
    Prachuabsupakij, Wanthanee
    Soonthornphisaj, Nuanwan
    [J]. INTELLIGENT DATA ANALYSIS, 2014, 18 (06) : 1109 - 1135
  • [8] Deep Learning with MCA-based Instance Selection and Bootstrapping for Imbalanced Data Classification
    Guan, Sheng
    Chen, Min
    Ha, Hsin-Yu
    Chen, Shu-Ching
    Shyu, Mei-Ling
    Zhang, Chengde
    [J]. 2015 IEEE CONFERENCE ON COLLABORATION AND INTERNET COMPUTING (CIC), 2015, : 288 - 295
  • [9] A Cluster-based Regrouping Approach for Imbalanced Data Distributions
    Yu, Wen
    Jiang, ShengYi
    [J]. 2012 WORLD AUTOMATION CONGRESS (WAC), 2012,
  • [10] Cluster-based sampling approaches to imbalanced data distributions
    Yen, Show-Jane
    Lee, Yue-Shi
    [J]. DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 427 - 436