Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems

被引：0

作者：

Mojahed, Atena Jalali ^{[1
]}

Moattar, Mohammad Hossein ^{[2
]}

Ghaffari, Hamidreza ^{[1
]}

机构：

[1] Islamic Azad Univ, Dept Comp Engn, Ferdows Branch, Ferdows, Iran

[2] Islamic Azad Univ, Dept Comp Engn, Mashhad Branch, Mashhad, Iran

来源：

BIG DATA AND COGNITIVE COMPUTING | 2024年 / 8卷 / 09期

关键词：

imbalanced data classification; distance metric learning; Bhattacharya divergence; class density estimation;

D O I：

10.3390/bdcc8090109

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the minority class, and the differences in class density strongly affect the efficiency of the classification algorithms. Therefore, the density of the classes is considered the main basis of learning the new distance metric. It is possible that the data of one class are composed of several densities, that is, the class is a combination of several normal distributions with different means and variances. In this paper, considering that classes may be multimodal, the distribution of each class is assumed in the form of a mixture of multivariate Gaussian densities. A density-based clustering algorithm is used for determining the number of components followed by the estimation of the parameters of the Gaussian components using maximum a posteriori density estimation. Then, the Bhattacharya distance between the Gaussian mixtures of the classes is maximized using an iterative scheme. To reach a large between-class margin, the distance between the external components is increased while decreasing the distance between the internal components. The proposed method is evaluated on 15 imbalanced datasets using the k-nearest neighbor (KNN) classifier. The results of the experiments show that using the proposed method significantly improves the efficiency of the classifier in imbalance classification problems. Also, when the imbalance ratio is very high and it is not possible to correctly identify minority class samples, the proposed method still provides acceptable performance.

引用

页数：27

共 50 条

[21] Kernel-based distance metric learning for microarray data classification
Huilin Xiong
Xue-wen Chen
BMC Bioinformatics, 7
[22] A density-based competitive data stream clustering network with self-adaptive distance metric
Xu, Baile
Shen, Furao
Zhao, Jinxi
NEURAL NETWORKS, 2019, 110 : 141 - 158
[23] A Novel Imbalanced Data Classification Method Based on Weakly Supervised Learning for Fault Diagnosis
Liu, Hui
Liu, Zhenyu
Jia, Weiqiang
Zhang, Donghao
Tan, Jianrong
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (03) : 1583 - 1593
[24] CLASSIFICATION OF MASS SPECTROMETRY DATA Using Manifold and Supervised Distance Metric Learning
Liu, Qingzhong
Sung, Andrew H.
Ribeiro, Bernardete M.
Qiao, Mengyu
BIOSIGNALS 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, 2009, : 396 - +
[25] Hybrid density-based adaptive weighted collaborative representation for imbalanced learning
Yanting Li
Shuai Wang
Junwei Jin
Hongwei Tao
Chuang Han
C. L. Philip Chen
Applied Intelligence, 2024, 54 : 4334 - 4351
[26] Hybrid density-based adaptive weighted collaborative representation for imbalanced learning
Li, Yanting
Wang, Shuai
Jin, Junwei
Tao, Hongwei
Han, Chuang
Chen, C. L. Philip
APPLIED INTELLIGENCE, 2024, 54 (05) : 4334 - 4351
[27] Density-based laplacian kernels for semi-supervised learning
Zhang, Liang
Du, Ziping
Li, Minqiang
Journal of Information and Computational Science, 2009, 6 (02): : 781 - 788
[28] Intrinsic Persistent Homology via Density-based Metric Learning
Fernandez, Ximena
Borghini, Eugenio
Mindlin, Gabriel
Groisman, Pablo
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[29] A Unified Framework of Density-Based Clustering for Semi-Supervised Classification
Gertrudes, Jadson Castro
Zimek, Arthur
Sander, Jorg
Campello, Ricardo J. G. B.
30TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2018), 2018,
[30] A hybrid imbalanced classification model based on data density
Shi, Shengnan
Li, Jie
Zhu, Dan
Yang, Fang
Xu, Yong
INFORMATION SCIENCES, 2023, 624 : 50 - 67

← 1 2 3 4 5 →