Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems

被引:0
|
作者
Mojahed, Atena Jalali [1 ]
Moattar, Mohammad Hossein [2 ]
Ghaffari, Hamidreza [1 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Ferdows Branch, Ferdows, Iran
[2] Islamic Azad Univ, Dept Comp Engn, Mashhad Branch, Mashhad, Iran
关键词
imbalanced data classification; distance metric learning; Bhattacharya divergence; class density estimation;
D O I
10.3390/bdcc8090109
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the minority class, and the differences in class density strongly affect the efficiency of the classification algorithms. Therefore, the density of the classes is considered the main basis of learning the new distance metric. It is possible that the data of one class are composed of several densities, that is, the class is a combination of several normal distributions with different means and variances. In this paper, considering that classes may be multimodal, the distribution of each class is assumed in the form of a mixture of multivariate Gaussian densities. A density-based clustering algorithm is used for determining the number of components followed by the estimation of the parameters of the Gaussian components using maximum a posteriori density estimation. Then, the Bhattacharya distance between the Gaussian mixtures of the classes is maximized using an iterative scheme. To reach a large between-class margin, the distance between the external components is increased while decreasing the distance between the internal components. The proposed method is evaluated on 15 imbalanced datasets using the k-nearest neighbor (KNN) classifier. The results of the experiments show that using the proposed method significantly improves the efficiency of the classifier in imbalance classification problems. Also, when the imbalance ratio is very high and it is not possible to correctly identify minority class samples, the proposed method still provides acceptable performance.
引用
收藏
页数:27
相关论文
共 50 条
  • [21] Kernel-based distance metric learning for microarray data classification
    Huilin Xiong
    Xue-wen Chen
    BMC Bioinformatics, 7
  • [22] A density-based competitive data stream clustering network with self-adaptive distance metric
    Xu, Baile
    Shen, Furao
    Zhao, Jinxi
    NEURAL NETWORKS, 2019, 110 : 141 - 158
  • [23] A Novel Imbalanced Data Classification Method Based on Weakly Supervised Learning for Fault Diagnosis
    Liu, Hui
    Liu, Zhenyu
    Jia, Weiqiang
    Zhang, Donghao
    Tan, Jianrong
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2022, 18 (03) : 1583 - 1593
  • [24] CLASSIFICATION OF MASS SPECTROMETRY DATA Using Manifold and Supervised Distance Metric Learning
    Liu, Qingzhong
    Sung, Andrew H.
    Ribeiro, Bernardete M.
    Qiao, Mengyu
    BIOSIGNALS 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, 2009, : 396 - +
  • [25] Hybrid density-based adaptive weighted collaborative representation for imbalanced learning
    Yanting Li
    Shuai Wang
    Junwei Jin
    Hongwei Tao
    Chuang Han
    C. L. Philip Chen
    Applied Intelligence, 2024, 54 : 4334 - 4351
  • [26] Hybrid density-based adaptive weighted collaborative representation for imbalanced learning
    Li, Yanting
    Wang, Shuai
    Jin, Junwei
    Tao, Hongwei
    Han, Chuang
    Chen, C. L. Philip
    APPLIED INTELLIGENCE, 2024, 54 (05) : 4334 - 4351
  • [27] Density-based laplacian kernels for semi-supervised learning
    Zhang, Liang
    Du, Ziping
    Li, Minqiang
    Journal of Information and Computational Science, 2009, 6 (02): : 781 - 788
  • [28] Intrinsic Persistent Homology via Density-based Metric Learning
    Fernandez, Ximena
    Borghini, Eugenio
    Mindlin, Gabriel
    Groisman, Pablo
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [29] A Unified Framework of Density-Based Clustering for Semi-Supervised Classification
    Gertrudes, Jadson Castro
    Zimek, Arthur
    Sander, Jorg
    Campello, Ricardo J. G. B.
    30TH INTERNATIONAL CONFERENCE ON SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT (SSDBM 2018), 2018,
  • [30] A hybrid imbalanced classification model based on data density
    Shi, Shengnan
    Li, Jie
    Zhu, Dan
    Yang, Fang
    Xu, Yong
    INFORMATION SCIENCES, 2023, 624 : 50 - 67