Log-ratio lasso: Scalable, sparse estimation for log-ratio models

被引:18
|
作者
Bates, Stephen [1 ]
Tibshirani, Robert [1 ,2 ]
机构
[1] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[2] Stanford Univ, Dept Biomed Data Sci, Stanford, CA 94305 USA
关键词
compositional data; lasso; log-ratio; mass spectrometry; variable selection; POST-SELECTION INFERENCE; VARIABLE SELECTION; REGRESSION;
D O I
10.1111/biom.12995
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Positive-valued signal data is common in the biological and medical sciences, due to the prevalence of mass spectrometry other imaging techniques. With such data, only the relative intensities of the raw measurements are meaningful. It is desirable to consider models consisting of the log-ratios of all pairs of the raw features, since log-ratios are the simplest meaningful derived features. In this case, however, the dimensionality of the predictor space becomes large, and computationally efficient estimation procedures are required. In this work, we introduce an embedding of the log-ratio parameter space into a space of much lower dimension and use this representation to develop an efficient penalized fitting procedure. This procedure serves as the foundation for a two-step fitting procedure that combines a convex filtering step with a second non-convex pruning step to yield highly sparse solutions. On a cancer proteomics data set, the proposed method fits a highly sparse model consisting of features of known biological relevance while greatly improving upon the predictive accuracy of less interpretable methods.
引用
收藏
页码:613 / 624
页数:12
相关论文
共 50 条
  • [31] Proportion statistics to detect differentially expressed genes: a comparison with log-ratio statistics
    Tracy L Bergemann
    Jason Wilson
    [J]. BMC Bioinformatics, 12
  • [32] Baseline correction for stray light in log-ratio diode laser absorption measurements
    Krishna, Yedhu
    O'Byrne, Sean
    Kurtz, Joseph John
    [J]. APPLIED OPTICS, 2014, 53 (19) : 4128 - 4135
  • [33] Log-ratio analysis of microbiome data with many zeroes is library size dependent
    te Beest, Dennis E.
    Nijhuis, Els H.
    Mohlmann, Tim W. R.
    ter Braak, Cajo J. F.
    [J]. MOLECULAR ECOLOGY RESOURCES, 2021, 21 (06) : 1866 - 1874
  • [34] Compositional Data in Geostatistics: A Log-Ratio Based Framework to Analyze Regionalized Compositions
    Pawlowsky-Glahn, V.
    Egozcue, J. J.
    [J]. MATHEMATICAL GEOSCIENCES, 2020, 52 (08) : 1067 - 1084
  • [35] CHANGE DETECTION OF POLARIMETRIC SAR IMAGES USING MINKOWSKI LOG-RATIO DISTANCE
    Chen, Shuailin
    Yang, Xiangli
    Zou, Tongyuan
    Peng, Dong
    Yang, Wen
    Li, Heng-Chao
    [J]. IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 336 - 339
  • [36] A Novel Asymmetrical Probability Density Function for Modeling Log-Ratio SAR Images
    Ren, Weilong
    Song, Jianshe
    Zeng, Jing
    Zhang, Xiongmei
    [J]. IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2016, 13 (03) : 369 - 373
  • [37] Counts: an outstanding challenge for log-ratio analysis of compositional data in the molecular biosciences
    Lovell, David R.
    Chua, Xin-Yi
    McGrath, Annette
    [J]. NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (02)
  • [38] Proportion statistics to detect differentially expressed genes: a comparison with log-ratio statistics
    Bergemann, Tracy L.
    Wilson, Jason
    [J]. BMC BIOINFORMATICS, 2011, 12
  • [39] Weathering reactions and isometric log-ratio coordinates: Do they speak to each other?
    Buccianti, Antonella
    Zuo, Renguang
    [J]. APPLIED GEOCHEMISTRY, 2016, 75 : 189 - 199
  • [40] Probabilistic Multi-Shape Representation Using an Isometric Log-Ratio Mapping
    Changizi, Neda
    Hamarneh, Ghassan
    [J]. MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION - MICCAI 2010, PT III, 2010, 6363 : 563 - 570