A case study of normalization, missing data and variable selection methods in lipidomics

被引:7
|
作者
Kujala, M. [1 ]
Nevalainen, J. [1 ]
机构
[1] Univ Turku, Dept Math & Stat, FI-20014 Turku, Finland
关键词
lipidomics; left censoring; multiple imputation; normalization; penalized logistic regression; permutation tests; MULTIPLE IMPUTATION; REGRESSION; BIOINFORMATICS;
D O I
10.1002/sim.6296
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Lipidomics is an emerging field of science that holds the potential to provide a readout of biomarkers for an early detection of a disease. Our objective was to identify an efficient statistical methodology for lipidomicsespecially in finding interpretable and predictive biomarkers useful for clinical practice. In two case studies, we address the need for data preprocessing for regression modeling of a binary response. These are based on a normalization step, in order to remove experimental variability, and on a multiple imputation step, to make the full use of the incompletely observed data with potentially informative missingness. Finally, by cross-validation, we compare stepwise variable selection to penalized regression models on stacked multiple imputed data sets and propose the use of a permutation test as a global test of association. Our results show that, depending on the design of the study, these data preprocessing methods modestly improve the precision of classification, and no clear winner among the variable selection methods is found. Lipidomics profiles are found to be highly important predictors in both of the two case studies. Copyright (c) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:59 / 73
页数:15
相关论文
共 50 条
  • [1] Variable selection when missing values are present: a case study
    Lachenbruch, Peter A.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2011, 20 (04) : 429 - 444
  • [2] Investigating Variable Selection Techniques Under Missing Data: A Simulation Study
    Bain, Catherine
    Shi, Dingjing
    QUANTITATIVE PSYCHOLOGY, IMPS 2023, 2024, 452 : 109 - 119
  • [3] Flexible variable selection in the presence of missing data
    Williamson, Brian D.
    Huang, Ying
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2024, 20 (02): : 347 - 359
  • [4] VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA
    Garcia, Ramon I.
    Ibrahim, Joseph G.
    Zhu, Hongtu
    STATISTICA SINICA, 2010, 20 (01) : 149 - 165
  • [5] Automated Bayesian variable selection methods for binary regression models with missing covariate data
    Michael Bergrab
    Christian Aßmann
    AStA Wirtschafts- und Sozialstatistisches Archiv, 2024, 18 (2) : 203 - 244
  • [6] Clustering and variable selection in the presence of mixed variable types and missing data
    Storlie, C. B.
    Myers, S. M.
    Katusic, S. K.
    Weaver, A. L.
    Voigt, R. G.
    Croarkin, P. E.
    Stoeckel, R. E.
    Port, J. D.
    STATISTICS IN MEDICINE, 2018, 37 (19) : 2884 - 2899
  • [7] Variable selection in the presence of missing data: resampling and imputation
    Long, Qi
    Johnson, Brent A.
    BIOSTATISTICS, 2015, 16 (03) : 596 - 610
  • [8] Papers on normalization, variable selection, classification or clustering of microarray data
    Rocke, David M.
    Ideker, Trey
    Troyanskaya, Olga
    Quackenbush, John
    Dopazo, Joaquin
    BIOINFORMATICS, 2009, 25 (06) : 701 - 702
  • [9] NORMALIZATION AND VARIANT ASSESSMENT METHODS IN SELECTION OF ROAD ALIGNMENT VARIANTS - CASE STUDY
    Gardziejczyk, Wladyslaw
    Zabicki, Piotr
    JOURNAL OF CIVIL ENGINEERING AND MANAGEMENT, 2017, 23 (04) : 510 - 523
  • [10] Bayesian variable selection and shrinkage strategies in a complicated modelling setting with missing data: A case study using multistate models
    Beesley, Lauren J.
    Taylor, Jeremy M. G.
    STATISTICAL MODELLING, 2021, 21 (1-2) : 11 - 29