Data complexity assessment in undersampled classification of high-dimensional biomedical data

被引:25
|
作者
Baumgartner, R [1 ]
Somorjai, RL [1 ]
机构
[1] Natl Res Council Canada, Inst Biodiagnost, Winnipeg, MB R3B 1Y6, Canada
关键词
classification; data complexity; regularization; undersampled biomedical problems;
D O I
10.1016/j.patrec.2006.01.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Regularized linear classifiers have been successfully applied in undersampled, i.e. small sample size/high dimensionality biomedical classification problems. Additionally, a design of data complexity measures was proposed in order to assess the competence of a classifier in a particular context. Our work was motivated by the analysis of ill-posed regression problems by Elden and the interpretation of linear discriminant analysis as a mean square error classifier. Using Singular Value Decomposition analysis, we define a discriminatory power spectrum and show that it provides useful means of data complexity assessment for undersampled classification problems. In five real-life biomedical data sets of increasing difficulty we demonstrate how the data complexity of a classification problem can be related to the performance of regularized linear classifiers. We show that the concentration of the discriminatory power manifested in the discriminatory power spectrum is a deciding factor for the success of the regularized linear classifiers in undersampled classification problems. As a practical outcome of our work, the proposed data complexity assessment may facilitate the choice of a classifier for a given undersampled problem. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:1383 / 1389
页数:7
相关论文
共 50 条
  • [1] Representation and classification of high-dimensional biomedical spectral data
    Pedrycz, W.
    Lee, D. J.
    Pizzi, N. J.
    [J]. PATTERN ANALYSIS AND APPLICATIONS, 2010, 13 (04) : 423 - 436
  • [2] Representation and classification of high-dimensional biomedical spectral data
    W. Pedrycz
    D. J. Lee
    N. J. Pizzi
    [J]. Pattern Analysis and Applications, 2010, 13 : 423 - 436
  • [3] Visualization of high-dimensional biomedical image data
    Serocka, Peter
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2007, 2007, 4810 : 475 - 482
  • [4] A classification algorithm for high-dimensional data
    Roy, Asim
    [J]. INNS CONFERENCE ON BIG DATA 2015 PROGRAM, 2015, 53 : 345 - 355
  • [5] Data-dependent kernels for high-dimensional data classification
    Wang, JD
    Kwok, JT
    Shen, HC
    Quan, L
    [J]. PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 102 - 107
  • [6] SARA: A memetic algorithm for high-dimensional biomedical data
    Baliarsingh, Santos Kumar
    Muhammad, Khan
    Bakshi, Sambit
    [J]. APPLIED SOFT COMPUTING, 2021, 101
  • [8] Enhanced algorithm for high-dimensional data classification
    Wang, Xiaoming
    Wang, Shitong
    [J]. APPLIED SOFT COMPUTING, 2016, 40 : 1 - 9
  • [9] Online Nonlinear Classification for High-Dimensional Data
    Vanli, N. Denizcan
    Ozkan, Huseyin
    Delibalta, Ibrahim
    Kozat, Suleyman S.
    [J]. 2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 685 - 688
  • [10] A training algorithm for classification of high-dimensional data
    Vieira, A
    Barradas, N
    [J]. NEUROCOMPUTING, 2003, 50 : 461 - 472