False discovery rate;
Extreme value;
High dimension;
Sparsity;
D O I:
10.1007/978-3-319-34139-2_15
中图分类号:
O29 [应用数学];
学科分类号:
070104 ;
摘要:
In recent years, there has been much work done on high dimensional problems in both theory and applications since high dimensional data are getting more common in broad areas such as microarray data analysis. One important issue in multiple testing problems in high dimensional data is controlling the significance level of large scale simultaneous testing to select significant ones among huge number of genes. In many cases, the true null distribution is assumed to be well-known or a parametric distribution so that p-values can be easily calculated. In practice, the true null distribution may be misspecified or different from the assumed distribution. In this paper, we consider a procedure for a FDR based on extreme values which is less sensitive to inaccurate p-values. The normalized forms are assumed to be approximately a standard normal by the central limit theorem (CLT). Comparing to the CLT approximation, we showthat FDR procedurewith extreme values achieves a more accurate simultaneous test level under some weaker conditions on sample sizes. We provide simulation studies and a real data example to compare the performance of our proposed procedure and an existing procedure.