A Nearest Neighbor Algorithm for Imbalanced Classification

被引:2
|
作者
Viola, Remi [1 ,3 ]
Emonet, Remi [1 ]
Habrard, Amaury [1 ]
Metzler, Guillaume [2 ]
Riou, Sebastien [3 ]
Sebban, Marc [1 ]
机构
[1] Univ Lyon, UJM St Etienne, Grad Sch, Hubert Curien Lab,Inst Opt,CNRS,UMR 5516, St Etienne, France
[2] Univ Lyon, Lyon2, ERIC, UR 3083, 5 Ave Pierre Mendes France, F-69676 Bron, France
[3] French Minist Econ & Finances, Direct Gen Finances Publ, Paris, France
关键词
Machine learning; nearest neighbor algorithm; imbalanced classification;
D O I
10.1142/S0218213021500135
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the inability of the accuracy-driven methods to address the challenging problem of learning from imbalanced data, several alternative measures have been proposed in the literature, like the Area Under the ROC Curve (AUC), the Average Precision (AP), the F-measure, the G-Mean, etc. However, these latter measures are neither smooth, convex nor separable, making their direct optimization hard in practice. In this paper, we tackle the challenging problem of imbalanced learning from a nearest-neighbor (NN) classification perspective, where the minority examples typically belong to the class of interest. Based on simple geometrical ideas, we introduce an algorithm that rescales the distance between a query sample and any positive training example. This leads to a modification of the Voronoi regions and thus of the decision boundaries of the NN classifier. We provide a theoretical justification about this scaling scheme which inherently aims at reducing the False Negative rate while controlling the number of False Positives. We further formally establish a link between the proposed method and cost-sensitive learning. An extensive experimental study is conducted on many public imbalanced datasets showing that our method is very effective with respect to popular Nearest-Neighbor algorithms, comparable to state-of-the-art sampling methods and even yields the best performance when combined with them.
引用
收藏
页数:27
相关论文
共 50 条
  • [1] Nearest Neighbor Distributions for Imbalanced Classification
    Kriminger, Evan
    Principe, Jose C.
    Lakshminarayan, Choudur
    [J]. 2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [2] A Classification Method for Imbalanced Data Based on SMOTE and Fuzzy Rough Nearest Neighbor Algorithm
    Zhao, Weibin
    Xu, Mengting
    Jia, Xiuyi
    Shang, Lin
    [J]. ROUGH SETS, FUZZY SETS, DATA MINING, AND GRANULAR COMPUTING, RSFDGRC 2015, 2015, 9437 : 340 - 351
  • [3] Clustering algorithm for imbalanced data based on nearest neighbor
    Wu, Sen
    Wang, Yu-zhi
    Gao, Xiao-nan
    [J]. Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2020, 42 (09): : 1209 - 1219
  • [4] An adaptive nearest neighbor algorithm for classification
    Wang, JG
    Neskovic, P
    Cooper, LN
    [J]. Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 3069 - 3074
  • [5] Improving k Nearest Neighbor with Exemplar Generalization for Imbalanced Classification
    Li, Yuxuan
    Zhang, Xiuzhen
    [J]. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6635 : 321 - 332
  • [6] Proposing New Method to Improve Gravitational Fixed Nearest Neighbor Algorithm for Imbalanced Data Classification
    Nikpour, Bahareh
    Shabani, Mahin
    Nezamabadi-pour, Hossein
    [J]. 2017 2ND CONFERENCE ON SWARM INTELLIGENCE AND EVOLUTIONARY COMPUTATION (CSIEC), 2017, : 6 - 11
  • [7] Efficient Classification with an Improved Nearest Neighbor Algorithm
    Pujari, Madhavi
    Awati, Chetan
    Kharade, Sonam
    [J]. 2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA), 2018,
  • [8] An adaptive large margin nearest neighbor classification algorithm
    Yang, Liu
    Yu, Jian
    Jing, Liping
    [J]. Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2013, 50 (11): : 2269 - 2277
  • [9] A Classification Algorithm in Li-K Nearest Neighbor
    Wang, Bangjun
    Zhang, Li
    Wang, Xiaoqian
    [J]. 2013 FOURTH GLOBAL CONGRESS ON INTELLIGENT SYSTEMS (GCIS), 2013, : 185 - 189
  • [10] An adaptive nearest neighbor classification algorithm for data streams
    Law, YN
    Zaniolo, C
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2005, 2005, 3721 : 108 - 120