Weighted pivot coordinates for partial least squares-based marker discovery in high-throughput compositional data

被引:7
|
作者
Stefelova, Nikola [1 ]
Palarea-Albaladejo, Javier [2 ]
Hron, Karel [1 ]
机构
[1] Palacky Univ, Fac Sci, 17 Listopadu 12, Olomouc 77146, Czech Republic
[2] Biomath & Stat Scotland, Edinburgh, Midlothian, Scotland
关键词
compositional data; high-throughput data; log-ratio analysis; marker discovery; PLS regression; METHANE EMISSIONS; ROUNDED ZEROS; REGRESSION; PACKAGE; MODEL;
D O I
10.1002/sam.11514
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-throughput data representing large mixtures of chemical or biological signals are ordinarily produced in the molecular sciences. Given a number of samples, partial least squares (PLS) regression is a well-established statistical method to investigate associations between them and any continuous response variables of interest. However, technical artifacts generally make the raw signals not directly comparable between samples. Thus, data normalization is required before any meaningful scientific information can be drawn. This often allows to characterize the processed signals as compositional data where the relevant information is contained in the pairwise log-ratios between the components of the mixture. The (log-ratio) pivot coordinate approach facilitates the aggregation into single variables of the pairwise log-ratios of a component to all the remaining components. This simplifies interpretability and the investigation of their relative importance but, particularly in a high-dimensional context, the aggregated log-ratios can easily mix up information from different underlaying processes. In this context, we propose a weighting strategy for the construction of pivot coordinates for PLS regression which draws on the correlation between response variable and pairwise log-ratios. Using real and simulated data sets, we demonstrate that this proposal enhances the discovery of biological markers in high-throughput compositional data.
引用
收藏
页码:315 / 330
页数:16
相关论文
共 23 条
  • [1] Sparse partial least-squares regression for high-throughput survival data analysis
    Lee, Donghwan
    Lee, Youngjo
    Pawitan, Yudi
    Lee, Woojoo
    STATISTICS IN MEDICINE, 2013, 32 (30) : 5340 - 5352
  • [2] Sparse partial least-squares regression and its applications to high-throughput data analysis
    Lee, Donghwan
    Lee, Woojoo
    Lee, Youngjo
    Pawitan, Yudi
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2011, 109 (01) : 1 - 8
  • [3] Designing Shapes and Handling Noisy Data with Weighted Least Squares-Based Subdivision Schemes
    Asghar, Muhammad
    Mustafa, Ghulam
    Khan, Faheem
    Mustafa, Rakib
    PUNJAB UNIVERSITY JOURNAL OF MATHEMATICS, 2024, 56 (08): : 447 - 462
  • [4] Grey Kernel Partial Least Squares-based Prediction for Temporal Data Aggregation in Sensor Networks
    Kang, Jian
    Tang, Liwei
    Zuo, Xianzhang
    Li, Hao
    2009 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING AND INTELLIGENT SYSTEMS, PROCEEDINGS, VOL 3, 2009, : 38 - +
  • [5] Partial least squares-based polynomial chaos Kriging for high-dimensional reliability analysis
    Zhou, Tong
    Peng, Yongbo
    Guo, Tong
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2023, 240
  • [6] A Partial least squares-based regression approach for analysis of frontotemporal dementia gene markers in human brain gene microarray data
    Chan, S. C.
    Wu, H. C.
    Lin, J. Q.
    Zhang, Z. G.
    2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
  • [7] A partial least squares-based consensus regression method for the analysis of near-infrared complex spectral data of plant samples
    Su, Zhenqiang
    Tong, Weida
    Shi, Leming
    Shao, Xueguang
    Cai, Wensheng
    ANALYTICAL LETTERS, 2006, 39 (09) : 2073 - 2083
  • [8] Covariance-based locally weighted partial least squares for high-performance adaptive modeling
    Hazama, Koji
    Kano, Manabu
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2015, 146 : 55 - 62
  • [9] Gene Discovery and Molecular Marker Development, Based on High-Throughput Transcript Sequencing of Paspalum dilatatum Poir
    Giordano, Andrea
    Cogan, Noel O. I.
    Kaur, Sukhjiwan
    Drayton, Michelle
    Mouradov, Aidyn
    Panter, Stephen
    Schrauf, Gustavo E.
    Mason, John G.
    Spangenberg, German C.
    PLOS ONE, 2014, 9 (02):
  • [10] High-throughput field phenotyping using hyperspectral reflectance and partial least squares regression (PLSR) reveals genetic modifications to photosynthetic capacity
    Meacham-Hensold, Katherine
    Montes, Christopher M.
    Wu, Jin
    Guan, Kaiyu
    Fu, Peng
    Ainsworth, Elizabeth A.
    Pederson, Taylor
    Moore, Caitlin E.
    Brown, Kenny Lee
    Raines, Christine
    Bernacchi, Carl J.
    REMOTE SENSING OF ENVIRONMENT, 2019, 231