Estimating classification probabilities in high-dimensional diagnostic studies

被引:6
|
作者
Appel, Inka J. [1 ]
Gronwald, Wolfram [1 ]
Spang, Rainer [1 ]
机构
[1] Univ Regensburg, Inst Funct Genom, D-93053 Regensburg, Germany
关键词
GENE; CANCER; DISEASE;
D O I
10.1093/bioinformatics/btr434
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Classification algorithms for high-dimensional biological data like gene expression profiles or metabolomic fingerprints are typically evaluated by the number of misclassifications across a test dataset. However, to judge the classification of a single case in the context of clinical diagnosis, we need to assess the uncertainties associated with that individual case rather than the average accuracy across many cases. Reliability of individual classifications can be expressed in terms of class probabilities. While classification algorithms are a well-developed area of research, the estimation of class probabilities is considerably less progressed in biology, with only a few classification algorithms that provide estimated class probabilities. Results: We compared several probability estimators in the context of classification of metabolomics profiles. Evaluation criteria included sparseness biases, calibration of the estimator, the variance of the estimator and its performance in identifying highly reliable classifications. We observed that several of them display artifacts that compromise their use in practice. Classification probabilities based on a combination of local cross-validation error rates and monotone regression prove superior in metabolomic profiling.
引用
收藏
页码:2563 / 2570
页数:8
相关论文
共 50 条
  • [1] CASE-STUDIES IN HIGH-DIMENSIONAL CLASSIFICATION
    APTE, C
    SASISEKHARAN, R
    SESHADRI, V
    WEISS, SM
    APPLIED INTELLIGENCE, 1994, 4 (03) : 269 - 281
  • [2] Hamiltonian MCMC methods for estimating rare events probabilities in high-dimensional problems
    Papakonstantinou, Konstantinos G.
    Nikbakht, Hamed
    Eshra, Elsayed
    PROBABILISTIC ENGINEERING MECHANICS, 2023, 74
  • [3] Estimating Orthant Probabilities of High-Dimensional Gaussian Vectors with An Application to Set Estimation
    Azzimonti, Dario
    Ginsbourger, David
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2018, 27 (02) : 255 - 267
  • [4] Estimating and testing high-dimensional mediation effects in epigenetic studies
    Zhang, Haixiang
    Zheng, Yinan
    Zhang, Zhou
    Gao, Tao
    Joyce, Brian
    Yoon, Grace
    Zhang, Wei
    Schwartz, Joel
    Just, Allan
    Colicino, Elena
    Vokonas, Pantel
    Zhao, Lihui
    Lv, Jinchi
    Baccarelli, Andrea
    Hou, Lifang
    Liu, Lei
    BIOINFORMATICS, 2016, 32 (20) : 3150 - 3154
  • [5] Incorporating prior probabilities into high-dimensional classifiers
    Hall, Peter
    Xue, Jing-Hao
    BIOMETRIKA, 2010, 97 (01) : 31 - 48
  • [6] Estimating the support of a high-dimensional distribution
    Schölkopf, B
    Platt, JC
    Shawe-Taylor, J
    Smola, AJ
    Williamson, RC
    NEURAL COMPUTATION, 2001, 13 (07) : 1443 - 1471
  • [7] BOUNDARY CROSSING PROBABILITIES FOR HIGH-DIMENSIONAL BROWNIAN MOTION
    Fu, James C.
    Wu, Tung-Lung
    JOURNAL OF APPLIED PROBABILITY, 2016, 53 (02) : 543 - 553
  • [8] Fast computation of high-dimensional multivariate normal probabilities
    Phinikettos, Ioannis
    Gandy, Axel
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (04) : 1521 - 1529
  • [9] Classification of sparse high-dimensional vectors
    Ingster, Yuri I.
    Pouet, Christophe
    Tsybakov, Alexandre B.
    PHILOSOPHICAL TRANSACTIONS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2009, 367 (1906): : 4427 - 4448
  • [10] Margin trees for high-dimensional classification
    Tibshirani, Robert
    Hastie, Trevor
    JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 637 - 652