Data complexity assessment in undersampled classification of high-dimensional biomedical data

被引:25
|
作者
Baumgartner, R [1 ]
Somorjai, RL [1 ]
机构
[1] Natl Res Council Canada, Inst Biodiagnost, Winnipeg, MB R3B 1Y6, Canada
关键词
classification; data complexity; regularization; undersampled biomedical problems;
D O I
10.1016/j.patrec.2006.01.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Regularized linear classifiers have been successfully applied in undersampled, i.e. small sample size/high dimensionality biomedical classification problems. Additionally, a design of data complexity measures was proposed in order to assess the competence of a classifier in a particular context. Our work was motivated by the analysis of ill-posed regression problems by Elden and the interpretation of linear discriminant analysis as a mean square error classifier. Using Singular Value Decomposition analysis, we define a discriminatory power spectrum and show that it provides useful means of data complexity assessment for undersampled classification problems. In five real-life biomedical data sets of increasing difficulty we demonstrate how the data complexity of a classification problem can be related to the performance of regularized linear classifiers. We show that the concentration of the discriminatory power manifested in the discriminatory power spectrum is a deciding factor for the success of the regularized linear classifiers in undersampled classification problems. As a practical outcome of our work, the proposed data complexity assessment may facilitate the choice of a classifier for a given undersampled problem. (c) 2006 Elsevier B.V. All rights reserved.
引用
收藏
页码:1383 / 1389
页数:7
相关论文
共 50 条
  • [31] Centroid particle swarm optimisation for high-dimensional data classification
    Yahya, Anwar Ali
    [J]. JOURNAL OF EXPERIMENTAL & THEORETICAL ARTIFICIAL INTELLIGENCE, 2018, 30 (06) : 857 - 886
  • [32] Effect of Data Discretization on the Classification Accuracy in a High-Dimensional Framework
    Tillander, Annika
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2012, 27 (04) : 355 - 374
  • [33] Feature Selection and Classification for High-Dimensional Incomplete Multimodal Data
    Deng, Wan-Yu
    Liu, Dan
    Dong, Ying-Ying
    [J]. MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [34] A novel feature learning framework for high-dimensional data classification
    Li, Yanxia
    Chai, Yi
    Yin, Hongpeng
    Chen, Bo
    [J]. INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (02) : 555 - 569
  • [35] Classification by ensembles from random partitions of high-dimensional data
    Ahn, Hongshik
    Moon, Hojin
    Fazzari, Melissa J.
    Lim, Noha
    Chen, James J.
    Kodell, Ralph L.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (12) : 6166 - 6179
  • [36] An Efficient and Versatile Variational Method for High-Dimensional Data Classification
    Cai, Xiaohao
    Chan, Raymond H.
    Xie, Xiaoyu
    Zeng, Tieyong
    [J]. JOURNAL OF SCIENTIFIC COMPUTING, 2024, 100 (03)
  • [37] Semisupervised Classification With Novel Graph Construction for High-Dimensional Data
    Yu, Zhiwen
    Ye, Fengxu
    Yang, Kaixiang
    Cao, Wenming
    Chen, C. L. Philip
    Cheng, Lianglun
    You, Jane
    Wong, Hau-San
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 75 - 88
  • [38] Ensemble of penalized logistic models for classification of high-dimensional data
    Ijaz, Musarrat
    Asghar, Zahid
    Gul, Asma
    [J]. COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2021, 50 (07) : 2072 - 2088
  • [39] High-dimensional spectral data classification with nonparametric feature screening
    Li, Chuan-Quan
    Xu, Qing-Song
    [J]. JOURNAL OF CHEMOMETRICS, 2020, 34 (03)
  • [40] A novel ensemble method for high-dimensional genomic data classification
    Espichan, Alexandra
    Villanueva, Edwin
    [J]. PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 2229 - 2236