On Asymptotic Distributions and Confidence Intervals for LIFT Measures in Data Mining

被引:3
|
作者
Jiang, Wenxin [1 ,2 ]
Zhao, Yu [3 ]
机构
[1] Shandong Univ, Jinan, Shandong, Peoples R China
[2] Northwestern Univ, Stat, Evanston, IL 60208 USA
[3] Amazon, Seattle, WA USA
关键词
Empirical process; Functional delta method; %response; Subsampling; Validation data; BANDS;
D O I
10.1080/01621459.2014.993080
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
A LIFT measure, such as the response rate, lift, or the percentage of captured response, is a fundamental measure of effectiveness for a scoring rule obtained from data mining, which is estimated from a set of validation data. In this article, we study how to construct confidence intervals of the LIFT measures. We point out the subtlety of this task and explain how simple binomial confidence intervals can have incorrect coverage probabilities, due to omitting variation from the sample percentile of the scoring rule. We derive the asymptotic distribution using some advanced empirical process theory and the functional delta method in the Appendix. The additional variation is shown to be related to a conditional mean response, which can be estimated by a local averaging of the responses over the scores from the validation data. Alternatively, a subsampling method is shown to provide a valid confidence interval, without needing to estimate the conditional mean response. Numerical experiments are conducted to compare these different methods regarding the coverage probabilities and the lengths of the resulting confidence intervals.
引用
收藏
页码:1717 / 1725
页数:9
相关论文
共 50 条
  • [1] Mining Data Streams with Dynamic Confidence Intervals
    Trabold, Daniel
    Horvath, Tamas
    [J]. BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY, DAWAK 2016, 2016, 9829 : 99 - 113
  • [2] Asymptotic distributions and confidence intervals of component loading in principal component analysis
    Tsukada, S
    Sugiyama, T
    [J]. NEW DEVELOPMENTS IN PSYCHOMETRICS, 2003, : 681 - 688
  • [3] Asymptotic and Bootstrap Confidence Intervals for the Ratio of Modes of Log-normal Distributions
    Singhasomboon, Lapasrada
    Gao, Chengyu
    Sirisaiyard, Sasiwimon
    Panichkitkosolkul, Wararit
    Volodin, Andrei
    [J]. LOBACHEVSKII JOURNAL OF MATHEMATICS, 2023, 44 (09) : 3860 - 3871
  • [4] Asymptotic and Bootstrap Confidence Intervals for the Ratio of Modes of Log-normal Distributions
    Lapasrada Singhasomboon
    Chengyu Gao
    Sasiwimon Sirisaiyard
    Wararit Panichkitkosolkul
    Andrei Volodin
    [J]. Lobachevskii Journal of Mathematics, 2023, 44 : 3860 - 3871
  • [5] Confidence intervals for measures of interaction
    Assmann, SF
    Hosmer, DW
    Lemeshow, S
    Mundt, KA
    [J]. EPIDEMIOLOGY, 1996, 7 (03) : 286 - 290
  • [6] CONFIDENCE INTERVALS FOR MEASURES OF HERITABILITY
    BROEMELING, LD
    [J]. BIOMETRICS, 1969, 25 (02) : 424 - +
  • [7] Asymptotic confidence intervals for Poisson regression
    Kohler, Michael
    Krzyzak, Adam
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 2007, 98 (05) : 1072 - 1094
  • [8] Multiple imputation confidence intervals for the mean of the discrete distributions for incomplete data
    Lee, Chung-Han
    Wang, Hsiuying
    [J]. STATISTICS IN MEDICINE, 2022, 41 (07) : 1172 - 1190
  • [10] Confidence intervals, prediction intervals and tolerance intervals for negative binomial distributions
    Dang, Bao-Anh
    Krishnamoorthy, K.
    [J]. STATISTICAL PAPERS, 2022, 63 (03) : 795 - 820