Estimating the probabilities of misclassification using CV when the dimension and the sample sizes are large

被引:0
|
作者
Nakagawa, Tomoyuki [1 ]
机构
[1] Tokyo Univ Sci, Fac Sci & Technol, Depertment Informat Sci, Noda, Chiba 2788510, Japan
关键词
Discriminant analysis; Classification; Probability of Misclassification; Cross-Validation; asymptotic expansion; High-dimensional; CROSS-VALIDATION; BIAS CORRECTION;
D O I
暂无
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In this paper, we study about estimating the probabilities of misclassification in the high-dimensional data. In many cases, the cross-validation (CV) is often used for estimations of the probabilities of misclassification. CV provides a nearly unbiased estimate, using the original data when the sample sizes are large. On the other hand, the properties of CV are not well-known when the dimension is large as compared to the sample sizes. Therefore, we investigate asymptotic properties of CV when the dimension and the sample sizes tend to be large. Furthermore, we suggest the three methods for correcting the bias by using CV which is usable in the high-dimensional data. We show performances of the estimators in the simulation studies.
引用
收藏
页码:373 / 411
页数:39
相关论文
共 50 条
  • [1] EPMC estimation in discriminant analysis when the dimension and sample sizes are large
    Tonda, Tetsuji
    Nakagawa, Tomoyuki
    Wakaki, Hirofumi
    [J]. HIROSHIMA MATHEMATICAL JOURNAL, 2017, 47 (01) : 43 - 62
  • [2] A test for the equality of covariance matrices when the dimension is large relative to the sample sizes
    Schott, James R.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 51 (12) : 6535 - 6542
  • [3] Estimating change in landscape elements using different sample sizes
    Howard, DC
    Barr, CJ
    [J]. QUANTITATIVE APPROACHES TO LANDSCAPE ECOLOGY, 2000, : 61 - 69
  • [4] Sample sizes when using multiple linear regression for prediction
    Knofczynski, Gregory T.
    Mundfrom, Daniel
    [J]. EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT, 2008, 68 (03) : 431 - 442
  • [5] An Update on Using the Range to Estimate σ When Determining Sample Sizes
    Rhiel, George Steven
    Markowski, Edward
    [J]. PSYCHOLOGICAL REPORTS, 2017, 120 (02) : 319 - 331
  • [6] Estimating the Single Nucleotide Polymorphism Genotype Misclassification From Routine Double Measurements in a Large Epidemiologic Sample
    Heid, Iris M.
    Lamina, Claudia
    Kuechenhoff, Helmut
    Fischer, Guido
    Klopp, Norman
    Kolz, Melanie
    Grallert, Harald
    Vollmert, Caren
    Wagner, Stefanie
    Huth, Cornelia
    Mueller, Julia
    Mueller, Martina
    Hunt, Steven C.
    Peters, Annette
    Paulweber, Bernhard
    Wichmann, H. -Erich
    Kronenberg, Florian
    Illig, Thomas
    [J]. AMERICAN JOURNAL OF EPIDEMIOLOGY, 2008, 168 (08) : 878 - 889
  • [7] A simple method for estimating genetic diversity in large populations from finite sample sizes
    Bashalkhanov, Stanislav
    Pandey, Madhav
    Rajora, Om P.
    [J]. BMC GENETICS, 2009, 10
  • [8] A simple method for estimating genetic diversity in large populations from finite sample sizes
    Stanislav Bashalkhanov
    Madhav Pandey
    Om P Rajora
    [J]. BMC Genetics, 10
  • [9] Monitoring Variation in a Multivariate Process When the Dimension is Large Relative to the Sample Size
    Mason, Robert L.
    Chou, Youn-Min
    Young, John C.
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2009, 38 (06) : 939 - 951
  • [10] PREMIUM AND PROTECTION OF SEVERAL PROCEDURES FOR DEALING WITH OUTLIERS WHEN SAMPLE SIZES ARE MODERATE TO LARGE
    GUTTMAN, I
    [J]. ANNALS OF MATHEMATICAL STATISTICS, 1971, 42 (06): : 2191 - &