Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

被引:0
|
作者
Randy Carter
Netsanet Michael
机构
[1] State University of New York at Buffalo,Department of Biostatistics
[2] The Boeing Company,Boeing Commercial Airplanes
来源
关键词
Bilinear factor model; Principal component analysis; Principal component regression; Partial least squares; Factor structure covariance matrix; Factor analysis regression; Mean square error of prediction; Monte Carlo studies; Cross-validation;
D O I
暂无
中图分类号
学科分类号
摘要
Factor analysis regression (FAR) of yi\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y _i$$\end{document} on xi=(x1i,x2i,…,xpi)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{x}}}_i=(x _{1i},x _{2i},\ldots ,x _{pi})$$\end{document}, i = 1,2,...,n, has been studied only in the low-dimensional case (p<n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p < n )$$\end{document}, using maximum likelihood (ML) factor extraction. The ML method breaks down in high-dimensional cases (p>n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p >n )$$\end{document}. In this paper, we develop a high-dimensional version of FAR based on a computationally efficient method of factor extraction. We compare the performance of our high-dimensional FAR with partial least squares regression (PLSR) and principal component regression (PCR) under three underlying correlation structures: arbitrary correlation, factor model correlation structure, and when y is independent of x. Under each structure, we generated Monte Carlo training samples of sizes n<p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n <p$$\end{document} from a multivariate normal distribution with each structure. Parameters were fixed at estimates obtained from analyses of real data sets. Given the independence structure, we observed severe over-fitting by PLSR compared to FAR and PCR. Under the two dependent structures, FAR had a notably better average mean square error of prediction than PCR. The performance of FAR and PLSR were not notably different given the dependent structures. Thus, overall, FAR performed better than either PLSR or PCR.
引用
收藏
页码:115 / 132
页数:17
相关论文
共 50 条
  • [21] Adaptive Handling of Dependence in High-Dimensional Regression Modeling
    Hebert, Florian
    Causeur, David
    Emily, Mathieu
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2022,
  • [22] Adaptive Handling of Dependence in High-Dimensional Regression Modeling
    Hebert, Florian
    Causeur, David
    Emily, Mathieu
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2023, 32 (01) : 213 - 225
  • [23] Subgroup analysis for high-dimensional functional regression
    Zhang, Xiaochen
    Zhang, Qingzhao
    Ma, Shuangge
    Fang, Kuangnan
    JOURNAL OF MULTIVARIATE ANALYSIS, 2022, 192
  • [24] High-dimensional regression analysis with treatment comparisons
    Heng-Hui Lue
    Bing-Ran You
    Computational Statistics, 2013, 28 : 1299 - 1317
  • [25] High-dimensional regression analysis with treatment comparisons
    Lue, Heng-Hui
    You, Bing-Ran
    COMPUTATIONAL STATISTICS, 2013, 28 (03) : 1299 - 1317
  • [26] Robust high-dimensional regression for data with anomalous responses
    Mingyang Ren
    Sanguo Zhang
    Qingzhao Zhang
    Annals of the Institute of Statistical Mathematics, 2021, 73 : 703 - 736
  • [27] Quantile forward regression for high-dimensional survival data
    Lee, Eun Ryung
    Park, Seyoung
    Lee, Sang Kyu
    Hong, Hyokyoung G.
    LIFETIME DATA ANALYSIS, 2023, 29 (04) : 769 - 806
  • [28] Robust high-dimensional regression for data with anomalous responses
    Ren, Mingyang
    Zhang, Sanguo
    Zhang, Qingzhao
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2021, 73 (04) : 703 - 736
  • [29] Robust linear regression for high-dimensional data: An overview
    Filzmoser, Peter
    Nordhausen, Klaus
    WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2021, 13 (04)
  • [30] Quantile forward regression for high-dimensional survival data
    Eun Ryung Lee
    Seyoung Park
    Sang Kyu Lee
    Hyokyoung G. Hong
    Lifetime Data Analysis, 2023, 29 : 769 - 806