Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems

被引:0
|
作者
Mojahed, Atena Jalali [1 ]
Moattar, Mohammad Hossein [2 ]
Ghaffari, Hamidreza [1 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Ferdows Branch, Ferdows, Iran
[2] Islamic Azad Univ, Dept Comp Engn, Mashhad Branch, Mashhad, Iran
关键词
imbalanced data classification; distance metric learning; Bhattacharya divergence; class density estimation;
D O I
10.3390/bdcc8090109
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the minority class, and the differences in class density strongly affect the efficiency of the classification algorithms. Therefore, the density of the classes is considered the main basis of learning the new distance metric. It is possible that the data of one class are composed of several densities, that is, the class is a combination of several normal distributions with different means and variances. In this paper, considering that classes may be multimodal, the distribution of each class is assumed in the form of a mixture of multivariate Gaussian densities. A density-based clustering algorithm is used for determining the number of components followed by the estimation of the parameters of the Gaussian components using maximum a posteriori density estimation. Then, the Bhattacharya distance between the Gaussian mixtures of the classes is maximized using an iterative scheme. To reach a large between-class margin, the distance between the external components is increased while decreasing the distance between the internal components. The proposed method is evaluated on 15 imbalanced datasets using the k-nearest neighbor (KNN) classifier. The results of the experiments show that using the proposed method significantly improves the efficiency of the classifier in imbalance classification problems. Also, when the imbalance ratio is very high and it is not possible to correctly identify minority class samples, the proposed method still provides acceptable performance.
引用
收藏
页数:27
相关论文
共 50 条
  • [31] Supervised Class Distribution Learning for GANs-based Imbalanced Classification
    Cai, Zixin
    Wang, Xinyue
    Zhou, Mingjie
    Xu, Jian
    Jing, Liping
    2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 41 - 50
  • [32] Imbalanced Node Classification Algorithm Based on Self-Supervised Learning
    Cui, Caixia
    Wang, Jie
    Pang, Tianjie
    Liang, Jiye
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2022, 35 (11): : 955 - 964
  • [33] Gaussian Mixture Based Semi Supervised Boosting For Imbalanced Data Classification
    Paul, Mahit Kumar
    Pal, Biprodip
    2016 2ND INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER & TELECOMMUNICATION ENGINEERING (ICECTE), 2016,
  • [34] GAN-Based Semi-supervised For Imbalanced Data Classification
    Zhou, Tingting
    Liu, Wei
    Zhou, Congyu
    Chen, Leiting
    2018 4TH INTERNATIONAL CONFERENCE ON INFORMATION MANAGEMENT (ICIM2018), 2018, : 17 - 21
  • [35] Semi-supervised distance metric learning based on local linear regression for data clustering
    Zhang, Hong
    Yu, Jun
    Wang, Meng
    Liu, Yun
    NEUROCOMPUTING, 2012, 93 : 100 - 105
  • [36] Semi-supervised Classification Based Mixed Sampling for Imbalanced Data
    Zhao, Jianhua
    Liu, Ning
    OPEN PHYSICS, 2019, 17 (01): : 975 - 983
  • [37] An imbalanced training data SVM classification problem based on Riemannian metric
    Zhou Qifeng
    Lin Chengde
    Luo Linkai
    Peng Hong
    PROCEEDINGS OF THE 26TH CHINESE CONTROL CONFERENCE, VOL 4, 2007, : 554 - +
  • [38] Imbalanced Data Classification Method Based on Ensemble Learning
    Xiang, Yu
    Xie, Yongping
    COMMUNICATIONS, SIGNAL PROCESSING, AND SYSTEMS, CSPS 2018, VOL III: SYSTEMS, 2020, 517 : 18 - 24
  • [39] Local Density-Based Adaptive Undersampling Approach for Handling Imbalanced and Overlapped Data
    Liu Yi
    Huang Xian
    Cao Zhen
    Li Honglu
    2024 4TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND ARTIFICIAL INTELLIGENCE, CCAI 2024, 2024, : 263 - 268
  • [40] MINIMUM DISTANCE DENSITY-BASED ESTIMATION
    CAO, R
    CUEVAS, A
    FRAIMAN, R
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 1995, 20 (06) : 611 - 631