Manifold-based synthetic oversampling with manifold conformance estimation

被引:46
|
作者
Bellinger, Colin [1 ]
Drummond, Christopher [3 ]
Japkowicz, Nathalie [2 ]
机构
[1] Univ Alberta, Dept Comp Sci, Edmonton, AB, Canada
[2] American Univ, Dept Comp Sci, Washington, DC USA
[3] Natl Res Council Canada, Ottawa, ON, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Class imbalance; Synthetic oversampling; Manifold learning; SMOTE; DIMENSIONALITY REDUCTION; DETERMINING NUMBER; NEURAL-NETWORKS; CLASSIFICATION; SMOTE;
D O I
10.1007/s10994-017-5670-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification domains such as those in medicine, national security and the environment regularly suffer from a lack of training instances for the class of interest. In many cases, classification models induced under these conditions have poor predictive performance on the important minority class. Synthetic oversampling can be applied to mitigate the impact of imbalance by generating additional training instances. In this field, the majority of research has focused on refining the SMOTE algorithm. We note, however, that the generative bias of SMOTE is not appropriate for the large class of learning problems that conform to the manifold property. These are high-dimensional problems, such as image and spectral classification, with implicit feature spaces that are lower-dimensional than their physical data spaces. We show that ignoring this can lead to instances being generated in erroneous regions of the data space. We propose a general framework for manifold-based synthetic oversampling that helps users to select a domain-appropriate manifold learning method, such as PCA or autoencoder, and apply it to model and generate additional training samples. We evaluate data generation on theoretical distributions and image classification tasks that are standard in the manifold learning literature, and empirically show its positive impact on the classification of high-dimensional image and gamma-ray spectra tasks, along with 16 UCI datasets.
引用
收藏
页码:605 / 637
页数:33
相关论文
共 50 条
  • [21] Face recognition with manifold-based kernel discriminant analysis
    Araabi, Babak N.
    Gharibshah, Zhabiz
    [J]. 2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [22] A manifold-based framework for studying the dynamics of the vaginal microbiome
    Tsamir-Rimon, Mor
    Borenstein, Elhanan
    [J]. NPJ BIOFILMS AND MICROBIOMES, 2023, 9 (01)
  • [23] Manifold-based Supervised Feature Extraction and Face Recognition
    Chen, Cai-kou
    Li, Cao
    Yang, Jing-yu
    [J]. PROCEEDINGS OF THE 2008 CHINESE CONFERENCE ON PATTERN RECOGNITION (CCPR 2008), 2008, : 36 - +
  • [24] A manifold-based framework for studying the dynamics of the vaginal microbiome
    Mor Tsamir-Rimon
    Elhanan Borenstein
    [J]. npj Biofilms and Microbiomes, 9
  • [25] Manifold-based learning for person re-identification
    Viet, N-A. Che
    Cong, D-N. Truong
    Ho-Phuoc, T.
    [J]. 2015 INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR COMMUNICATIONS (ATC), 2015, : 688 - 691
  • [26] Off-the-Grid Compressive Time Delay Estimation via Manifold-Based Optimization
    Zhang, Wei
    Yu, Feng
    [J]. IEEE COMMUNICATIONS LETTERS, 2017, 21 (05) : 983 - 986
  • [27] Target Reconstruction Using Manifold-Based Compressive Sensing
    Hou, Biao
    Cheng, Xi
    Jiang, Hua Qiong
    [J]. INTELLIGENT SCIENCE AND INTELLIGENT DATA ENGINEERING, ISCIDE 2011, 2012, 7202 : 74 - 80
  • [28] Learning Photometric Stereo via Manifold-based Mapping
    Ju, Yakun
    Jian, Muwei
    Dong, Junyu
    Lam, Kin-Man
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2020, : 411 - 414
  • [29] Manifold-Based Sparse Representation for Hyperspectral Image Classification
    Tang, Yuan Yan
    Yuan, Haoliang
    Li, Luoqing
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2014, 52 (12): : 7606 - 7618
  • [30] A manifold-based approach to sparse global constraint satisfaction problems
    Ali Baharev
    Arnold Neumaier
    Hermann Schichl
    [J]. Journal of Global Optimization, 2019, 75 : 949 - 971