Supervised Density-Based Metric Learning Based on Bhattacharya Distance for Imbalanced Data Classification Problems

被引:0
|
作者
Mojahed, Atena Jalali [1 ]
Moattar, Mohammad Hossein [2 ]
Ghaffari, Hamidreza [1 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Ferdows Branch, Ferdows, Iran
[2] Islamic Azad Univ, Dept Comp Engn, Mashhad Branch, Mashhad, Iran
关键词
imbalanced data classification; distance metric learning; Bhattacharya divergence; class density estimation;
D O I
10.3390/bdcc8090109
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning distance metrics and distinguishing between samples from different classes are among the most important topics in machine learning. This article proposes a new distance metric learning approach tailored for highly imbalanced datasets. Imbalanced datasets suffer from a lack of data in the minority class, and the differences in class density strongly affect the efficiency of the classification algorithms. Therefore, the density of the classes is considered the main basis of learning the new distance metric. It is possible that the data of one class are composed of several densities, that is, the class is a combination of several normal distributions with different means and variances. In this paper, considering that classes may be multimodal, the distribution of each class is assumed in the form of a mixture of multivariate Gaussian densities. A density-based clustering algorithm is used for determining the number of components followed by the estimation of the parameters of the Gaussian components using maximum a posteriori density estimation. Then, the Bhattacharya distance between the Gaussian mixtures of the classes is maximized using an iterative scheme. To reach a large between-class margin, the distance between the external components is increased while decreasing the distance between the internal components. The proposed method is evaluated on 15 imbalanced datasets using the k-nearest neighbor (KNN) classifier. The results of the experiments show that using the proposed method significantly improves the efficiency of the classifier in imbalance classification problems. Also, when the imbalance ratio is very high and it is not possible to correctly identify minority class samples, the proposed method still provides acceptable performance.
引用
收藏
页数:27
相关论文
共 50 条
  • [41] Density-based Data Clustering Algorithm in Multi-metric Spaces
    Zhu, Yi-Fan
    Luo, Cheng-Yang
    Ma, Rui-Yao
    Chen, Lu
    Mao, Yu-Ren
    Gao, Yun-Jun
    Ruan Jian Xue Bao/Journal of Software, 2025, 36 (02): : 851 - 873
  • [42] A fast supervised density-based discretization algorithm for classification tasks in the medical domain
    Aristodimou, Aristos
    Diavastos, Andreas
    Pattichis, Constantinos S.
    HEALTH INFORMATICS JOURNAL, 2022, 28 (01)
  • [43] A unified view of density-based methods for semi-supervised clustering and classification
    Jadson Castro Gertrudes
    Arthur Zimek
    Jörg Sander
    Ricardo J. G. B. Campello
    Data Mining and Knowledge Discovery, 2019, 33 : 1894 - 1952
  • [44] A unified view of density-based methods for semi-supervised clustering and classification
    Gertrudes, Jadson Castro
    Zimek, Arthur
    Sander, Jorg
    Campello, Ricardo J. G. B.
    DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 33 (06) : 1894 - 1952
  • [45] Identifying Mammalian MicroRNA Targets Based on Supervised Distance Metric Learning
    Liu, Hui
    Zhou, Shuigene
    Guan, Jihong
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2013, 17 (02) : 427 - 435
  • [46] Semi-supervised Coefficient-Based Distance Metric Learning
    Wang, Zhangcheng
    Li, Ya
    Tian, Xinmei
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 586 - 596
  • [47] Mahalanobis Distance Metric Learning Algorithm for Instance-based Data Stream Classification
    Rivero Perez, Jorge Luis
    Ribeiro, Bernardete
    Perez, Carlos Morell
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1857 - 1862
  • [48] Comparative Study on Defect Prediction Algorithms of Supervised Learning Software Based on Imbalanced Classification Data Sets
    Ge, Jianxin
    Liu, Jiaomin
    Liu, Wenyuan
    2018 19TH IEEE/ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING AND PARALLEL/DISTRIBUTED COMPUTING (SNPD), 2018, : 399 - 406
  • [49] Density-based semi-supervised clustering
    Ruiz, Carlos
    Spiliopoulou, Myra
    Menasalvas, Ernestina
    DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 21 (03) : 345 - 370
  • [50] Density-based semi-supervised clustering
    Carlos Ruiz
    Myra Spiliopoulou
    Ernestina Menasalvas
    Data Mining and Knowledge Discovery, 2010, 21 : 345 - 370