Estimating classification probabilities in high-dimensional diagnostic studies

被引:6
|
作者
Appel, Inka J. [1 ]
Gronwald, Wolfram [1 ]
Spang, Rainer [1 ]
机构
[1] Univ Regensburg, Inst Funct Genom, D-93053 Regensburg, Germany
关键词
GENE; CANCER; DISEASE;
D O I
10.1093/bioinformatics/btr434
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Classification algorithms for high-dimensional biological data like gene expression profiles or metabolomic fingerprints are typically evaluated by the number of misclassifications across a test dataset. However, to judge the classification of a single case in the context of clinical diagnosis, we need to assess the uncertainties associated with that individual case rather than the average accuracy across many cases. Reliability of individual classifications can be expressed in terms of class probabilities. While classification algorithms are a well-developed area of research, the estimation of class probabilities is considerably less progressed in biology, with only a few classification algorithms that provide estimated class probabilities. Results: We compared several probability estimators in the context of classification of metabolomics profiles. Evaluation criteria included sparseness biases, calibration of the estimator, the variance of the estimator and its performance in identifying highly reliable classifications. We observed that several of them display artifacts that compromise their use in practice. Classification probabilities based on a combination of local cross-validation error rates and monotone regression prove superior in metabolomic profiling.
引用
收藏
页码:2563 / 2570
页数:8
相关论文
共 50 条
  • [41] New algorithms for efficient high-dimensional nonparametric classification
    Computer Science Department, Carnegie Mellon University, Pittsburgh, PA 15213, United States
    J. Mach. Learn. Res., 2006, (1135-1158):
  • [42] HIGH-DIMENSIONAL ASYMPTOTICS OF PREDICTION: RIDGE REGRESSION AND CLASSIFICATION
    Dobriban, Edgar
    Wager, Stefan
    ANNALS OF STATISTICS, 2018, 46 (01): : 247 - 279
  • [43] ESTIMATING SUFFICIENT REDUCTIONS OF THE PREDICTORS IN ABUNDANT HIGH-DIMENSIONAL REGRESSIONS
    Cook, R. Dennis
    Forzani, Liliana
    Rothman, Adam J.
    ANNALS OF STATISTICS, 2012, 40 (01): : 353 - 384
  • [44] A Comparison of Methods for Estimating the Determinant of High-Dimensional Covariance Matrix
    Hu, Zongliang
    Dong, Kai
    Dai, Wenlin
    Tong, Tiejun
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2017, 13 (02):
  • [45] Simplified estimating functions for diffusion models with a high-dimensional parameter
    Bibby, BM
    Sorensen, M
    SCANDINAVIAN JOURNAL OF STATISTICS, 2001, 28 (01) : 99 - 112
  • [46] Simultaneous Feature Selection and Classification for High-Dimensional Data
    Pai, Vriddhi
    Gupta, Subhash Chand
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT 2018), 2018, : 153 - 158
  • [47] CONSISTENT SCREENING PROCEDURES IN HIGH-DIMENSIONAL BINARY CLASSIFICATION
    Jiang, Hangjin
    Zhao, Xingqiu
    Ma, Ronald C. W.
    Fan, Xiaodan
    STATISTICA SINICA, 2022, 32 (01) : 109 - 130
  • [48] Statistical Sparse Independence Rule for High-dimensional Classification
    Wang, Liping
    Ji, Changtai
    Xie, Shanggao
    Zhang, Qi
    2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE WORKSHOPS (WIW 2016), 2016, : 50 - 53
  • [49] Estimating and Accounting for Unobserved Covariates in High-Dimensional Correlated Data
    McKennan, Chris
    Nicolae, Dan
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2022, 117 (537) : 225 - 236
  • [50] SENSING-AWARE CLASSIFICATION WITH HIGH-DIMENSIONAL DATA
    Orten, Burkay
    Ishwar, Prakash
    Karl, W. Clem
    Saligrama, Venkatesh
    Pien, Homer
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 3700 - 3703