Factor Analysis Regression for Predictive Modeling with High-Dimensional Data

被引:0
|
作者
Randy Carter
Netsanet Michael
机构
[1] State University of New York at Buffalo,Department of Biostatistics
[2] The Boeing Company,Boeing Commercial Airplanes
来源
关键词
Bilinear factor model; Principal component analysis; Principal component regression; Partial least squares; Factor structure covariance matrix; Factor analysis regression; Mean square error of prediction; Monte Carlo studies; Cross-validation;
D O I
暂无
中图分类号
学科分类号
摘要
Factor analysis regression (FAR) of yi\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y _i$$\end{document} on xi=(x1i,x2i,…,xpi)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\varvec{x}}}_i=(x _{1i},x _{2i},\ldots ,x _{pi})$$\end{document}, i = 1,2,...,n, has been studied only in the low-dimensional case (p<n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p < n )$$\end{document}, using maximum likelihood (ML) factor extraction. The ML method breaks down in high-dimensional cases (p>n)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p >n )$$\end{document}. In this paper, we develop a high-dimensional version of FAR based on a computationally efficient method of factor extraction. We compare the performance of our high-dimensional FAR with partial least squares regression (PLSR) and principal component regression (PCR) under three underlying correlation structures: arbitrary correlation, factor model correlation structure, and when y is independent of x. Under each structure, we generated Monte Carlo training samples of sizes n<p\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n <p$$\end{document} from a multivariate normal distribution with each structure. Parameters were fixed at estimates obtained from analyses of real data sets. Given the independence structure, we observed severe over-fitting by PLSR compared to FAR and PCR. Under the two dependent structures, FAR had a notably better average mean square error of prediction than PCR. The performance of FAR and PLSR were not notably different given the dependent structures. Thus, overall, FAR performed better than either PLSR or PCR.
引用
收藏
页码:115 / 132
页数:17
相关论文
共 50 条
  • [1] Factor Analysis Regression for Predictive Modeling with High-Dimensional Data
    Carter, Randy
    Michael, Netsanet
    JOURNAL OF QUANTITATIVE ECONOMICS, 2022, 20 (SUPPL 1) : 115 - 132
  • [2] A joint estimation for the high-dimensional regression modeling on stratified data
    Gao, Yimiao
    Yang, Yuehan
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2023, 52 (12) : 6129 - 6140
  • [3] High-dimensional predictive regression in the presence of cointegration
    Koo, Bonsoo
    Anderson, Heather M.
    Seo, Myung Hwan
    Yao, Wenying
    JOURNAL OF ECONOMETRICS, 2020, 219 (02) : 456 - 477
  • [4] FACTOR MODELS AND VARIABLE SELECTION IN HIGH-DIMENSIONAL REGRESSION ANALYSIS
    Kneip, Alois
    Sarda, Pascal
    ANNALS OF STATISTICS, 2011, 39 (05): : 2410 - 2447
  • [5] Modeling High-Dimensional Data
    Vempala, Santosh S.
    COMMUNICATIONS OF THE ACM, 2012, 55 (02) : 112 - 112
  • [6] Factor Modeling for High-Dimensional Interval-Valued Data
    Guo, Yan
    Zou, Guchu
    Wu, Jianhong
    STUDIES IN NONLINEAR DYNAMICS AND ECONOMETRICS, 2025,
  • [7] High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis
    Daye, Z. John
    Chen, Jinbo
    Li, Hongzhe
    BIOMETRICS, 2012, 68 (01) : 316 - 326
  • [8] Interpolating Predictors in High-Dimensional Factor Regression
    Bunea, Florentina
    Strimas-Mackey, Seth
    Wegkamp, Marten
    Journal of Machine Learning Research, 2022, 23
  • [9] Interpolating Predictors in High-Dimensional Factor Regression
    Bunea, Florentina
    Strimas-Mackey, Seth
    Wegkamp, Marten
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [10] HIGH-DIMENSIONAL FACTOR REGRESSION FOR HETEROGENEOUS SUBPOPULATIONS
    Wang, Peiyao
    Li, Quefeng
    Shen, Dinggan
    Liu, Yufeng
    STATISTICA SINICA, 2023, 33 (01) : 27 - 53