Computing confidence intervals from massive data via penalized quantile smoothing splines
被引:0
|
作者:
Zhang, Likun
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Stat, State Coll, PA 16801 USAPenn State Univ, Dept Stat, State Coll, PA 16801 USA
Zhang, Likun
[1
]
del Castillo, Enrique
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Stat, State Coll, PA 16801 USA
Penn State Univ, Dept Ind & Mfg Engn, State Coll, PA 16801 USAPenn State Univ, Dept Stat, State Coll, PA 16801 USA
del Castillo, Enrique
[1
,2
]
Berglund, Andrew J.
论文数: 0引用数: 0
h-index: 0
机构:
Netflix Inc, Los Gatos, CA 95032 USAPenn State Univ, Dept Stat, State Coll, PA 16801 USA
Berglund, Andrew J.
[3
]
Tingley, Martin P.
论文数: 0引用数: 0
h-index: 0
机构:
Netflix Inc, Los Gatos, CA 95032 USAPenn State Univ, Dept Stat, State Coll, PA 16801 USA
Tingley, Martin P.
[3
]
Govind, Nirmal
论文数: 0引用数: 0
h-index: 0
机构:
Netflix Inc, Los Gatos, CA 95032 USAPenn State Univ, Dept Stat, State Coll, PA 16801 USA
Govind, Nirmal
[3
]
机构:
[1] Penn State Univ, Dept Stat, State Coll, PA 16801 USA
[2] Penn State Univ, Dept Ind & Mfg Engn, State Coll, PA 16801 USA
A/B testing;
Bag of little bootstraps;
Cross-validation;
Penalized splines;
Quantile smoothing;
REGRESSION;
BOOTSTRAP;
ALGORITHM;
D O I:
10.1016/j.csda.2019.106885
中图分类号:
TP39 [计算机的应用];
学科分类号:
081203 ;
0835 ;
摘要:
New methodology is presented for the computation of pointwise confidence intervals from massive response data sets in one or two covariates using robust and flexible quantile regression splines. Novel aspects of the method include a new cross-validation procedure for selecting the penalization coefficient and a reformulation of the quantile smoothing problem based on a weighted data representation. These innovations permit for uncertainty quantification and fast parameter selection in very large data sets via a distributed "bag of little bootstraps". Experiments with synthetic data demonstrate that the computed confidence intervals feature empirical coverage rates that are generally within 2% of the nominal rates. The approach is broadly applicable to the analysis of large data sets in one or two dimensions. Comparative (or "A/B") experiments conducted at Netflix aimed at optimizing the quality of streaming video originally motivated this work, but the proposed methods have general applicability. The methodology is illustrated using an open source application: the comparison of geo-spatial climate model scenarios from NASA's Earth Exchange. (C) 2019 Elsevier B.V. All rights reserved.
机构:
Anhui Univ, Sch Math Sci, Hefei 230601, Peoples R China
Cornell Univ, Dept Econ, Ithaca, NY 14853 USAAnhui Univ, Sch Math Sci, Hefei 230601, Peoples R China
Yang, Lianqiang
Hong, Yongmiao
论文数: 0引用数: 0
h-index: 0
机构:
Cornell Univ, Dept Econ, Ithaca, NY 14853 USAAnhui Univ, Sch Math Sci, Hefei 230601, Peoples R China
机构:
Univ Calif Santa Barbara, Dept Stat & Appl Probabil, Santa Barbara, CA 93106 USAUniv Calif Santa Barbara, Dept Stat & Appl Probabil, Santa Barbara, CA 93106 USA