Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems

被引:0
|
作者
Mojahed, Atena Jalali [1 ]
Moattar, Mohammad Hossein [2 ]
Ghaffari, Hamidreza [1 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Ferdows Branch, Ferdows, Iran
[2] Islamic Azad Univ, Dept Comp Engn, Mashhad Branch, Mashhad, Iran
关键词
imbalanced data classification; distance metric learning; Bhattacharya divergence; class density estimation;
D O I
10.3390/bdcc8090109
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the minority class, and the differences in class density strongly affect the efficiency of the classification algorithms. Therefore, the density of the classes is considered the main basis of learning the new distance metric. It is possible that the data of one class are composed of several densities, that is, the class is a combination of several normal distributions with different means and variances. In this paper, considering that classes may be multimodal, the distribution of each class is assumed in the form of a mixture of multivariate Gaussian densities. A density-based clustering algorithm is used for determining the number of components followed by the estimation of the parameters of the Gaussian components using maximum a posteriori density estimation. Then, the Bhattacharya distance between the Gaussian mixtures of the classes is maximized using an iterative scheme. To reach a large between-class margin, the distance between the external components is increased while decreasing the distance between the internal components. The proposed method is evaluated on 15 imbalanced datasets using the k-nearest neighbor (KNN) classifier. The results of the experiments show that using the proposed method significantly improves the efficiency of the classifier in imbalance classification problems. Also, when the imbalance ratio is very high and it is not possible to correctly identify minority class samples, the proposed method still provides acceptable performance.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Supervised kernel-based multi-modal Bhattacharya distance learning for imbalanced data classification
    Mojahed, Atena Jalali
    Moattar, Mohammad Hossein
    Ghaffari, Hamidreza
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (01) : 247 - 272
  • [2] A Density-Based Random Forest for Imbalanced Data Classification
    Dong, Jia
    Qian, Quan
    FUTURE INTERNET, 2022, 14 (03):
  • [3] A novel instance density-based hybrid resampling for imbalanced classification problems
    You-Jin Park
    Chung-Kang Ma
    Soft Computing, 2025, 29 (4) : 2031 - 2045
  • [4] A gravitational density-based mass sharing method for imbalanced data classification
    Rahmati, Farshad
    Nezamabadi-pour, Hossein
    Nikpour, Bahareh
    SN APPLIED SCIENCES, 2020, 2 (02):
  • [5] A gravitational density-based mass sharing method for imbalanced data classification
    Farshad Rahmati
    Hossein Nezamabadi-pour
    Bahareh Nikpour
    SN Applied Sciences, 2020, 2
  • [6] CDBH: A clustering and density-based hybrid approach for imbalanced data classification
    Mirzaei, Behzad
    Nikpour, Bahareh
    Nezamabadi-pour, Hossein
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164
  • [7] LDAS: Local density-based adaptive sampling for imbalanced data classification
    Yan, Yuanting
    Jiang, Yifei
    Zheng, Zhong
    Yu, Chengjin
    Zhang, Yiwen
    Zhang, Yanping
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
  • [8] Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap
    Sun, Peiqi
    Du, Yanhui
    Xiong, Siyun
    NEUROCOMPUTING, 2024, 609
  • [9] Distance Metric Learning for Kernel Density-Based Acoustic Model Under Limited Training Data Conditions
    Van Hai Do
    Xiao, Xiong
    Chng, Eng Siong
    Li, Haizhou
    2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 54 - 58
  • [10] A new instance density-based synthetic minority oversampling method for imbalanced classification problems
    Ma, Chung-Kang
    Park, You-Jin
    ENGINEERING OPTIMIZATION, 2022, 54 (10) : 1743 - 1757