Structured Ordinary Least Squares: A Sufficient Dimension Reduction approach for regressions with partitioned predictors and heterogeneous units

被引:2
|
作者
Liu, Yang [1 ]
Chiaromonte, Francesca [1 ]
Li, Bing [1 ]
机构
[1] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
基金
美国国家科学基金会;
关键词
Data integration; Ordinary least squares; Structured data; Sufficient dimension reduction; Variable selection; SLICED INVERSE REGRESSION; VARIABLE SELECTION; DATA INTEGRATION;
D O I
10.1111/biom.12579
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In many scientific and engineering fields, advanced experimental and computing technologies are producing data that are not just high dimensional, but also internally structured. For instance, statistical units may have heterogeneous origins from distinct studies or subpopulations, and features may be naturally partitioned based on experimental platforms generating them, or on information available about their roles in a given phenomenon. In a regression analysis, exploiting this known structure in the predictor dimension reduction stage that precedes modeling can be an effective way to integrate diverse data. To pursue this, we propose a novel Sufficient Dimension Reduction (SDR) approach that we call structured Ordinary Least Squares (sOLS). This combines ideas from existing SDR literature to merge reductions performed within groups of samples and/or predictors. In particular, it leads to a version of OLS for grouped predictors that requires far less computation than recently proposed groupwise SDR procedures, and provides an informal yet effective variable selection tool in these settings. We demonstrate the performance of sOLS by simulation and present a first application to genomic data. The R package sSDR, publicly available on CRAN, includes all procedures necessary to implement the sOLS approach.
引用
收藏
页码:529 / 539
页数:11
相关论文
共 11 条