RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs

被引:36
|
作者
Fan, Yingying [1 ]
Demirkaya, Emre [2 ]
Li, Gaorong [3 ]
Lv, Jinchi [1 ]
机构
[1] Univ Southern Calif, Marshall Sch Business, Data Sci & Operat Dept, Los Angeles, CA 90089 USA
[2] Univ Tennessee, Dept Business Analyt & Stat, Haslam Coll Business, Knoxville, TN USA
[3] Beijing Univ Technol, Beijing Inst Sci & Engn Comp, Beijing, Peoples R China
关键词
Big data; Graphical nonlinear knockoffs; High-dimensional nonlinear models; Large-scale inference and FDR; Power; Reproducibility; Robustness; FALSE DISCOVERY RATE; VARIABLE SELECTION; UNKNOWN SPARSITY; REGRESSION; TESTS; IDENTIFICATION; BOOTSTRAP; RATES;
D O I
10.1080/01621459.2018.1546589
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this article, we provide theoretical foundations on the power and robustness for the model-X knockoffs procedure introduced recently in Candes, Fan, Janson and Lv in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-X knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real dataset is analyzed to further assess the performance of the suggested knockoffs procedure. for this article are available online.
引用
收藏
页码:362 / 379
页数:18
相关论文
共 50 条
  • [1] RANK: Large-scale inference with graphical nonlinear knockoffs
    Fan, Yingying
    Demirkaya, Emre
    Li, Gaorong
    Lv, Jinchi
    arXiv, 2017,
  • [2] Fast distributed MAP inference for large-scale graphical models
    Soares, Claudia
    Gomes, Joao
    PROCEEDINGS OF 18TH INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES (IEEE EUROCON 2019), 2019,
  • [3] Multiscale Gaussian graphical models and algorithms for large-scale inference
    Choi, Myung Jin
    Willsky, Alan S.
    2007 IEEE/SP 14TH WORKSHOP ON STATISTICAL SIGNAL PROCESSING, VOLS 1 AND 2, 2007, : 229 - 233
  • [4] On Scale-Free Prior Distributions and Their Applicability in Large-Scale Network Inference with Gaussian Graphical Models
    Sheridan, Paul
    Kamimura, Takeshi
    Shimodaira, Hidetoshi
    COMPLEX SCIENCES, PT 1, 2009, 4 : 110 - 117
  • [5] Exploiting network topology for large-scale inference of nonlinear reaction models
    Galagali, Nikhil
    Marzouk, Youssef M.
    JOURNAL OF THE ROYAL SOCIETY INTERFACE, 2019, 16 (152)
  • [6] LARGE-SCALE RANK OF TEICHMULLER SPACE
    Eskin, Alex
    Masur, Howard
    Rafi, Kasra
    DUKE MATHEMATICAL JOURNAL, 2017, 166 (08) : 1517 - 1572
  • [7] On the tripling algorithm for large-scale nonlinear matrix equations with low rank structure
    Dong, Ning
    Yu, Bo
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2015, 288 : 18 - 32
  • [8] LARGE-SCALE INFERENCE WITH BLOCK STRUCTURE
    Kou, Jiyao
    Walther, Guenther
    ANNALS OF STATISTICS, 2022, 50 (03): : 1541 - 1572
  • [9] Reproducible learning in large-scale graphical models
    Zhou, Jia
    Li, Yang
    Zheng, Zemin
    Li, Daoji
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 189
  • [10] Scalable Algorithms for Bayesian Inference of Large-Scale Models from Large-Scale Data
    Ghattas, Omar
    Isaac, Tobin
    Petra, Noemi
    Stadler, Georg
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 3 - 6