Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems

被引：0

作者：

Mojahed, Atena Jalali ^{[1
]}

Moattar, Mohammad Hossein ^{[2
]}

Ghaffari, Hamidreza ^{[1
]}

机构：

[1] Islamic Azad Univ, Dept Comp Engn, Ferdows Branch, Ferdows, Iran

[2] Islamic Azad Univ, Dept Comp Engn, Mashhad Branch, Mashhad, Iran

来源：

BIG DATA AND COGNITIVE COMPUTING | 2024年 / 8卷 / 09期

关键词：

imbalanced data classification; distance metric learning; Bhattacharya divergence; class density estimation;

D O I：

10.3390/bdcc8090109

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the minority class, and the differences in class density strongly affect the efficiency of the classification algorithms. Therefore, the density of the classes is considered the main basis of learning the new distance metric. It is possible that the data of one class are composed of several densities, that is, the class is a combination of several normal distributions with different means and variances. In this paper, considering that classes may be multimodal, the distribution of each class is assumed in the form of a mixture of multivariate Gaussian densities. A density-based clustering algorithm is used for determining the number of components followed by the estimation of the parameters of the Gaussian components using maximum a posteriori density estimation. Then, the Bhattacharya distance between the Gaussian mixtures of the classes is maximized using an iterative scheme. To reach a large between-class margin, the distance between the external components is increased while decreasing the distance between the internal components. The proposed method is evaluated on 15 imbalanced datasets using the k-nearest neighbor (KNN) classifier. The results of the experiments show that using the proposed method significantly improves the efficiency of the classifier in imbalance classification problems. Also, when the imbalance ratio is very high and it is not possible to correctly identify minority class samples, the proposed method still provides acceptable performance.

引用

页数：27

共 50 条

[1] Supervised kernel-based multi-modal Bhattacharya distance learning for imbalanced data classification
Mojahed, Atena Jalali
Moattar, Mohammad Hossein
Ghaffari, Hamidreza
KNOWLEDGE AND INFORMATION SYSTEMS, 2025, 67 (01) : 247 - 272
[2] A Density-Based Random Forest for Imbalanced Data Classification
Dong, Jia
Qian, Quan
FUTURE INTERNET, 2022, 14 (03):
[3] A novel instance density-based hybrid resampling for imbalanced classification problems
You-Jin Park
Chung-Kang Ma
Soft Computing, 2025, 29 (4) : 2031 - 2045
[4] A gravitational density-based mass sharing method for imbalanced data classification
Rahmati, Farshad
Nezamabadi-pour, Hossein
Nikpour, Bahareh
SN APPLIED SCIENCES, 2020, 2 (02):
[5] A gravitational density-based mass sharing method for imbalanced data classification
Farshad Rahmati
Hossein Nezamabadi-pour
Bahareh Nikpour
SN Applied Sciences, 2020, 2
[6] CDBH: A clustering and density-based hybrid approach for imbalanced data classification
Mirzaei, Behzad
Nikpour, Bahareh
Nezamabadi-pour, Hossein
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 164
[7] LDAS: Local density-based adaptive sampling for imbalanced data classification
Yan, Yuanting
Jiang, Yifei
Zheng, Zhong
Yu, Chengjin
Zhang, Yiwen
Zhang, Yanping
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 191
[8] Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap
Sun, Peiqi
Du, Yanhui
Xiong, Siyun
NEUROCOMPUTING, 2024, 609
[9] Distance Metric Learning for Kernel Density-Based Acoustic Model Under Limited Training Data Conditions
Van Hai Do
Xiao, Xiong
Chng, Eng Siong
Li, Haizhou
2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 54 - 58
[10] A new instance density-based synthetic minority oversampling method for imbalanced classification problems
Ma, Chung-Kang
Park, You-Jin
ENGINEERING OPTIMIZATION, 2022, 54 (10) : 1743 - 1757

← 1 2 3 4 5 →