Group additive regression models for genomic data analysis

被引:35
|
作者
Luan, Yihui [1 ,2 ]
Li, Hongzhe [1 ]
机构
[1] Univ Penn, Sch Med, Dept Biostat & Epidemiol, Philadelphia, PA 19104 USA
[2] Shandong Univ, Sch Math & Syst Sci, Shandong 250100, Peoples R China
关键词
AFT models; boosting; gradient descent boosting; pathway;
D O I
10.1093/biostatistics/kxm015
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group features that are related to the phenotypes. In addition, the prediction mean square errors are also smaller than the component-wise boosting procedure. We demonstrate the application of the methods to pathway-based analysis of microarray gene expression data of breast cancer. Results from analysis of a breast cancer microarray gene expression data set indicate that the pathways of metalloendopeptidases (MMPs) and MMP inhibitors, as well as cell proliferation, cell growth, and maintenance are important to breast cancer-specific survival.
引用
收藏
页码:100 / 113
页数:14
相关论文
共 50 条
  • [1] Generalized additive regression for group testing data
    Liu, Yan
    McMahan, Christopher S.
    Tebbs, Joshua M.
    Gallagher, Colin M.
    Bilder, Christopher R.
    [J]. BIOSTATISTICS, 2021, 22 (04) : 873 - 889
  • [2] Regression analysis of clustered panel count data with additive mean models
    Wang, Weiwei
    Cui, Zhiyang
    Chen, Ruijie
    Wang, Yijun
    Zhao, Xiaobing
    [J]. STATISTICAL PAPERS, 2024, 65 (05) : 2915 - 2936
  • [3] Sampling adjusted analysis of dynamic additive regression models for longitudinal data
    Martinussen, T
    Scheike, TH
    [J]. SCANDINAVIAN JOURNAL OF STATISTICS, 2001, 28 (02) : 303 - 323
  • [4] Nonparametric pathway-based regression models for analysis of genomic data
    Wei, Zhi
    Li, Hongzhe
    [J]. BIOSTATISTICS, 2007, 8 (02) : 265 - 284
  • [5] Genomic breeding value estimation using nonparametric additive regression models
    Jörn Bennewitz
    Trygve Solberg
    Theo Meuwissen
    [J]. Genetics Selection Evolution, 41
  • [6] Genomic breeding value estimation using nonparametric additive regression models
    Bennewitz, Joern
    Solberg, Trygve
    Meuwissen, Theo
    [J]. GENETICS SELECTION EVOLUTION, 2009, 41
  • [7] Regression smoothers and additive models for censored and truncated data
    Kim, CK
    Lai, TL
    [J]. COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1999, 28 (11) : 2717 - 2747
  • [8] Wavelet regression and additive models for irregularly spaced data
    Haris, Asad
    Simon, Noah
    Shojaie, Ali
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [9] Applications of Multilevel Structured Additive Regression Models to Insurance Data
    Lang, Stefan
    Umlauf, Nikolaus
    [J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 155 - 164
  • [10] Additive Intensity Regression Models in Corporate Default Analysis
    Lando, David
    Medhat, Mamdouh
    Nielsen, Mads Stenbo
    Nielsen, Soren Feodor
    [J]. JOURNAL OF FINANCIAL ECONOMETRICS, 2013, 11 (03) : 443 - 485