Manifold-based synthetic oversampling with manifold conformance estimation

被引:46
|
作者
Bellinger, Colin [1 ]
Drummond, Christopher [3 ]
Japkowicz, Nathalie [2 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
[2] American Univ, Dept Comp Sci, Washington, DC USA
[3] Natl Res Council Canada, Ottawa, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Class imbalance; Synthetic oversampling; Manifold learning; SMOTE; DIMENSIONALITY REDUCTION; DETERMINING NUMBER; NEURAL-NETWORKS; CLASSIFICATION; SMOTE;
D O I
10.1007/s10994-017-5670-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification domains such as those in medicine, national security and the environment regularly suffer from a lack of training instances for the class of interest. In many cases, classification models induced under these conditions have poor predictive performance on the important minority class. Synthetic oversampling can be applied to mitigate the impact of imbalance by generating additional training instances. In this field, the majority of research has focused on refining the SMOTE algorithm. We note, however, that the generative bias of SMOTE is not appropriate for the large class of learning problems that conform to the manifold property. These are high-dimensional problems, such as image and spectral classification, with implicit feature spaces that are lower-dimensional than their physical data spaces. We show that ignoring this can lead to instances being generated in erroneous regions of the data space. We propose a general framework for manifold-based synthetic oversampling that helps users to select a domain-appropriate manifold learning method, such as PCA or autoencoder, and apply it to model and generate additional training samples. We evaluate data generation on theoretical distributions and image classification tasks that are standard in the manifold learning literature, and empirically show its positive impact on the classification of high-dimensional image and gamma-ray spectra tasks, along with 16 UCI datasets.
引用
收藏
页码:605 / 637
页数:33
相关论文
共 50 条
  • [1] Manifold-based synthetic oversampling with manifold conformance estimation
    Colin Bellinger
    Christopher Drummond
    Nathalie Japkowicz
    [J]. Machine Learning, 2018, 107 : 605 - 637
  • [2] Manifold-Based Learning and Synthesis
    Huang, Dong
    Yi, Zhang
    Pu, Xiaorong
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2009, 39 (03): : 592 - 606
  • [3] Manifold-based discriminant analysis
    [J]. Liu, Z.-B. (liu_zhongbao@hotmail.com), 2013, Science Press (35):
  • [4] Manifold-based surfaces with boundaries
    Tosun, Elif
    Zorin, Denis
    [J]. COMPUTER AIDED GEOMETRIC DESIGN, 2011, 28 (01) : 1 - 22
  • [5] Manifold Construction and Parameterization for Nonlinear Manifold-Based Model Reduction
    Gu, Chenjie
    Roychowdhury, Jaijeet
    [J]. 2010 15TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE (ASP-DAC 2010), 2010, : 202 - 207
  • [6] Manifold-Based Visual Object Counting
    Wang, Yi
    Zou, Yuexian
    Wang, Wenwu
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (07) : 3248 - 3263
  • [7] Estimation of arrival times from seismic waves: a manifold-based approach
    Taylor, Kye M.
    Procopio, Michael J.
    Young, Christopher J.
    Meyer, Francois G.
    [J]. GEOPHYSICAL JOURNAL INTERNATIONAL, 2011, 185 (01) : 435 - 452
  • [8] A Grassmann Manifold-based Domain Adaptation Approach
    Zheng, Jingjing
    Liu, Ming-Yu
    Chellappa, Rama
    Phillips, P. Jonathon
    [J]. 2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2095 - 2099
  • [9] Manifold-based sparse representation for opinion mining
    Zohre Karimi
    [J]. Scientific Reports, 13
  • [10] Manifold-based sparse representation for opinion mining
    Karimi, Zohre
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01)