Fuzzy Distance-based Undersampling Technique for Imbalanced Flood Data

被引:0
|
作者
Mahamud, Ku Ruhana Ku [1 ]
Zorkeflee, Maisarah [1 ]
Din, Aniza Mohamed [1 ]
机构
[1] Univ Utara Malaysia, Changlun, Malaysia
关键词
imbalanced flood data; resampling technique; fuzzy distance-based undersampling; fuzzy logic;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Performances of classifiers are affected by imbalanced data because instances in the minority class are often ignored. Imbalanced data often occur in many application domains including flood. If flood cases are misclassified, the impact of flood is higher than the misclassification of non-flood cases. Numerous resampling techniques such as undersampling and oversampling have been used to overcome the problem of misclassification of imbalanced data. However, the undersampling and oversampling techniques suffer from elimination of relevant data and overfitting, which may lead to poor classification results. This paper proposes a Fuzzy Distance-based Undersampling (FDUS) technique to increase classification accuracy. Entropy estimation is used to generate fuzzy thresholds which are used to categorise the instances in majority and minority classes into membership functions. The performance of FDUS was compared with three techniques based on Fmeasure and G-mean, experimented on flood data. From the results, FDUS achieved better F-measure and G-mean compared to the other techniques which showed that the FDUS was able to reduce the elimination of relevant data.
引用
收藏
页码:509 / 513
页数:5
相关论文
共 50 条
  • [31] Distance-based clustering of CGH data
    Liu, Jun
    Mohammed, Jaaved
    Carter, James
    Ranka, Sanjay
    Kahveci, Tamer
    Baudis, Michael
    [J]. BIOINFORMATICS, 2006, 22 (16) : 1971 - 1978
  • [32] Distance-based ANOVA for functional data
    Pedott, Alexandre Homsi
    Fogliatto, Flavio Sanson
    [J]. EUROPEAN JOURNAL OF INDUSTRIAL ENGINEERING, 2016, 10 (06) : 760 - 776
  • [33] Numerical Data Classification via Distance-Based Similarity Measures of Fuzzy Parameterized Fuzzy Soft Matrices
    Memis, Samet
    Enginoglu, Serdar
    Erkan, Ugur
    [J]. IEEE ACCESS, 2021, 9 : 88583 - 88601
  • [34] Imbalanced credit card fraud detection data: A solution based on hybrid neural network and clustering-based undersampling technique
    Huang, Huajie
    Liu, Bo
    Xue, Xiaoyu
    Cao, Jiuxin
    Chen, Xinyi
    [J]. APPLIED SOFT COMPUTING, 2024, 154
  • [35] Fuzzy rule-based oversampling technique for imbalanced and incomplete data learning
    Liu, Gencheng
    Yang, Youlong
    Li, Benchong
    [J]. KNOWLEDGE-BASED SYSTEMS, 2018, 158 : 154 - 174
  • [36] Fuzzy Measure Based Approximate Reasoning with Distance-based Operators
    Marata Takacs
    [J]. PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON APPLIED INFORMATICS AND COMMUNICATIONS, PTS I AND II: NEW ASPECTS OF APPLIED INFORMATICS AND COMMUNICATIONS, 2008, : 159 - +
  • [37] A novel progressively undersampling method based on the density peaks sequence for imbalanced data
    Xie, Xiaoying
    Liu, Huawen
    Zeng, Shouzhen
    Lin, Lingbin
    Li, Wen
    [J]. KNOWLEDGE-BASED SYSTEMS, 2021, 213
  • [38] FUZZY AND SMOTE RESAMPLING TECHNIQUE FOR IMBALANCED DATA SETS
    Zorkeflee, Maisarah
    Din, Aniza Mohamed
    Ku-Mahamud, Ku Ruhana
    [J]. PROCEEDINGS OF THE 5TH INTERNATIONAL CONFERENCE ON COMPUTING & INFORMATICS, 2015, : 638 - 643
  • [39] Distance-Based Data Mining Over Encrypted Data
    Tex, Christine
    Schaeler, Martin
    Boehm, Klemens
    [J]. 2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1264 - 1267
  • [40] Clustering Based Undersampling for Effective Learning from Imbalanced Data: An Iterative Approach
    Bhattacharya R.
    De R.
    Chakraborty A.
    Sarkar R.
    [J]. SN Computer Science, 5 (4)