RANK: Large-Scale Inference With Graphical Nonlinear Knockoffs

被引:36
|
作者
Fan, Yingying [1 ]
Demirkaya, Emre [2 ]
Li, Gaorong [3 ]
Lv, Jinchi [1 ]
机构
[1] Univ Southern Calif, Marshall Sch Business, Data Sci & Operat Dept, Los Angeles, CA 90089 USA
[2] Univ Tennessee, Dept Business Analyt & Stat, Haslam Coll Business, Knoxville, TN USA
[3] Beijing Univ Technol, Beijing Inst Sci & Engn Comp, Beijing, Peoples R China
关键词
Big data; Graphical nonlinear knockoffs; High-dimensional nonlinear models; Large-scale inference and FDR; Power; Reproducibility; Robustness; FALSE DISCOVERY RATE; VARIABLE SELECTION; UNKNOWN SPARSITY; REGRESSION; TESTS; IDENTIFICATION; BOOTSTRAP; RATES;
D O I
10.1080/01621459.2018.1546589
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Power and reproducibility are key to enabling refined scientific discoveries in contemporary big data applications with general high-dimensional nonlinear models. In this article, we provide theoretical foundations on the power and robustness for the model-X knockoffs procedure introduced recently in Candes, Fan, Janson and Lv in high-dimensional setting when the covariate distribution is characterized by Gaussian graphical model. We establish that under mild regularity conditions, the power of the oracle knockoffs procedure with known covariate distribution in high-dimensional linear models is asymptotically one as sample size goes to infinity. When moving away from the ideal case, we suggest the modified model-X knockoffs method called graphical nonlinear knockoffs (RANK) to accommodate the unknown covariate distribution. We provide theoretical justifications on the robustness of our modified procedure by showing that the false discovery rate (FDR) is asymptotically controlled at the target level and the power is asymptotically one with the estimated covariate distribution. To the best of our knowledge, this is the first formal theoretical result on the power for the knockoffs procedure. Simulation results demonstrate that compared to existing approaches, our method performs competitively in both FDR control and power. A real dataset is analyzed to further assess the performance of the suggested knockoffs procedure. for this article are available online.
引用
收藏
页码:362 / 379
页数:18
相关论文
共 50 条
  • [41] NONLINEAR DYNAMICS OF THE LARGE-SCALE STRUCTURE IN THE UNIVERSE
    SHANDARIN, SF
    PHYSICA D, 1994, 77 (1-3): : 342 - 353
  • [42] NONLINEAR CASCADES IN LARGE-SCALE ATMOSPHERIC FLOW
    STEINBERG, HL
    WIINNIEL.A
    YANG, CH
    JOURNAL OF GEOPHYSICAL RESEARCH, 1971, 76 (36): : 8629 - +
  • [43] Implicit solvers for large-scale nonlinear problems
    Keyes, David E.
    Reynolds, Daniel R.
    Woodward, Carol S.
    SCIDAC 2006: SCIENTIFIC DISCOVERY THROUGH ADVANCED COMPUTING, 2006, 46 : 433 - 442
  • [44] NONLINEAR ROSSBY WAVES ON A LARGE-SCALE CURRENT
    LARICHEV, VD
    REZNIK, GM
    OKEANOLOGIYA, 1976, 16 (02): : 200 - 206
  • [45] Nonlinear Constrained Realizations of the Large-Scale Structure
    Bistolas, V.
    Hoffman, K.
    Astrophysical Journal, 492 (01):
  • [46] SQP methods for large-scale nonlinear programming
    Gould, NIM
    Toint, PL
    SYSTEM MODELLING AND OPTIMIZATION: METHODS, THEORY AND APPLICATIONS, 2000, 46 : 149 - 178
  • [47] Parametric Stabilization of Large-Scale Nonlinear Systems
    Chen, Ning
    Shen, Xiao-yu
    Gui, Weihua
    PROCEEDINGS OF THE 10TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA 2012), 2012, : 2125 - 2129
  • [48] The linearization of a composite large-scale nonlinear system
    Han Zhi-tao
    Jing Yuanwei
    Duan Xiaodong
    Zhang Siying
    2006 CHINESE CONTROL CONFERENCE, VOLS 1-5, 2006, : 885 - +
  • [49] Large-Scale Adversarial Sports Play Retrieval with Learning to Rank
    Di, Mingyang
    Klabjan, Diego
    Sha, Long
    Lucey, Patrick
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2018, 12 (06)
  • [50] RANK-INVARIANT TRANSFORMATIONS AND CONTROLLABILITY OF LARGE-SCALE SYSTEMS
    SCHIZAS, C
    EVANS, FJ
    ELECTRONICS LETTERS, 1980, 16 (01) : 19 - 20