A fuzzy rough set-based undersampling approach for imbalanced data

被引:0
|
作者
Zhang, Xiao [1 ]
He, Zhaoqian [1 ]
Yang, Yanyan [2 ]
机构
[1] Xian Univ Technol, Dept Appl Math, 58 Yanxiang Rd, Xian 710054, Shanxi, Peoples R China
[2] Beijing Jiaotong Univ, Sch Software Engn, Beixiaguan Rd, Beijing 100044, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Fuzzy rough sets; Undersampling; Instance selection; CLASSIFIERS; REDUCTION;
D O I
10.1007/s13042-023-02064-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
How to effectively handle imbalanced data is one of the hot issues in the fields of machine learning and data mining. Undersampling is a popular technique of dealing with imbalanced data. The aim of undersampling is to select an instance subset from the majority class of an imbalanced dataset and then make the dataset balanced. However, the traditional undersampling approaches may lead to the information loss of majority class instances. Therefore, on the basis of the concept of the importance degree of a fuzzy granule, a measure criterion of selecting representative instances from the majority class is presented in this paper by considering the fuzzy relations between the k-nearest neighbors of a majority class instance and the minority class instances. Then, we put forward an undersampling approach based on fuzzy rough sets (USFRS). With the proposed USFRS, the representativeness of the selected majority class instances can be guaranteed and the information loss due to undersampling can be reduced to the utmost extent. Furthermore, USFRS is compared with the relative undersampling methods, and the difference of the experimental results is analyzed by the statistic test. The experimental results demonstrate that USFRS performs well in classification for imbalanced data.
引用
收藏
页码:2799 / 2810
页数:12
相关论文
共 50 条
  • [1] A rough set-based fuzzy clustering
    Zhao, YQ
    Zhou, XZ
    Tang, GZ
    [J]. INFORMATION RETRIEVAL TECHNOLOGY, PROCEEDINGS, 2005, 3689 : 401 - 409
  • [2] A Novel Approach to Fuzzy Rough Set-Based Analysis of Information Systems
    Mieszkowicz-Rolka, Alicja
    Rolka, Leszek
    [J]. INFORMATION SYSTEMS ARCHITECTURE AND TECHNOLOGY, PT IV, 2016, 432 : 173 - 183
  • [3] Rough fuzzy set-based image compression
    Petrosino, Alfredo
    Ferone, Alessio
    [J]. FUZZY SETS AND SYSTEMS, 2009, 160 (10) : 1485 - 1506
  • [4] A rough set-based approach to handling uncertainty in geographic data classification
    Jankowski, Piotr
    [J]. GEOGRAPHIC UNCERTAINTY IN ENVIRONMENTAL SECURITY, 2007, : 75 - 87
  • [5] Fuzzy Rough Set-Based Unstructured Text Categorization
    Bharadwaj, Aditya
    Ramanna, Sheela
    [J]. ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2017, 2017, 10233 : 335 - 340
  • [6] Rough set-based neuro-fuzzy system
    Ang, Kai Keng
    Quek, Chai
    [J]. 2006 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORK PROCEEDINGS, VOLS 1-10, 2006, : 742 - +
  • [7] A rough set-based approach to text classification
    Chouchoulas, A
    Shen, Q
    [J]. NEW DIRECTIONS IN ROUGH SETS, DATA MINING, AND GRANULAR-SOFT COMPUTING, 1999, 1711 : 118 - 127
  • [8] A novel approach of rough set-based attribute reduction using fuzzy discernibility matrix
    Yang, Ming
    Chen, Songcan
    Yang, Xubing
    [J]. FOURTH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, VOL 3, PROCEEDINGS, 2007, : 96 - 101
  • [9] A fuzzy set-based approach to data reconciliation in material flow modeling
    Dzubur, Nada
    Sunanta, Owat
    Laner, David
    [J]. APPLIED MATHEMATICAL MODELLING, 2017, 43 : 464 - 480
  • [10] Fuzzy Distance-based Undersampling Technique for Imbalanced Flood Data
    Mahamud, Ku Ruhana Ku
    Zorkeflee, Maisarah
    Din, Aniza Mohamed
    [J]. PROCEEDINGS OF KNOWLEDGE MANAGEMENT INTERNATIONAL CONFERENCE (KMICE) 2016, 2016, : 509 - 513