lrSVD: An efficient imputation algorithm for incomplete high-throughput compositional data

被引:0
|
作者
Palarea-Albaladejo, Javier [1 ]
Antoni Martin-Fernandez, Josep [1 ]
Ruiz-Gazen, Anne [2 ]
Thomas-Agnan, Christine [2 ]
机构
[1] Univ Girona, Dept Comp Sci Appl Math & Stat, Girona 17003, Spain
[2] Toulouse Sch Econ, Toulouse, France
关键词
zeros; missing data; compositional data; singular value decomposition; log- ratio analysis; MISSING VALUES; ZEROS;
D O I
10.1002/cem.3459
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Compositional methods have been successfully integrated into the chemometric toolkit to analyse and model different types of data generated by modern high-throughput technologies. Within this compositional framework, the focus is put on the relative information conveyed in the data by using log-ratio coordinate representations. However, log-ratios cannot be computed when the data matrix is not complete. A new computationally efficient data imputation algorithm based on compositional principles and aimed at high-throughput continuous-valued compositions is introduced that relies on a constrained low-rank matrix approximation of the data. Simulation and real metabolomics data are used to demonstrate its performance and ability to deal with different forms of incomplete data: zeros, nondetects, missing values or a combination of them. The computer routines lrSVD and lrSVDplus are implemented in the R package zCompositions to facilitate its use by practitioners.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Compositional Uncertainty Should Not Be Ignored in High-Throughput Sequencing Data Analysis
    Gloor, Gregory B.
    Macklaim, Jean M.
    Vu, Michael
    Fernandes, Andrew D.
    [J]. AUSTRIAN JOURNAL OF STATISTICS, 2016, 45 (04) : 73 - 87
  • [2] An Improved Mean Imputation Clustering Algorithm for Incomplete Data
    Shi, Hong
    Wang, Pingxin
    Yang, Xin
    Yu, Hualong
    [J]. NEURAL PROCESSING LETTERS, 2022, 54 (05) : 3537 - 3550
  • [3] Compositional analysis: a valid approach to analyze microbiome high-throughput sequencing data
    Gloor, Gregory B.
    Reid, Gregor
    [J]. CANADIAN JOURNAL OF MICROBIOLOGY, 2016, 62 (08) : 692 - 703
  • [4] An Improved Mean Imputation Clustering Algorithm for Incomplete Data
    Hong Shi
    Pingxin Wang
    Xin Yang
    Hualong Yu
    [J]. Neural Processing Letters, 2022, 54 : 3537 - 3550
  • [5] Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data
    Qiao, Dandi
    Yip, Wai-Ki
    Lange, Christoph
    [J]. BMC BIOINFORMATICS, 2012, 13
  • [6] Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data
    Dandi Qiao
    Wai-Ki Yip
    Christoph Lange
    [J]. BMC Bioinformatics, 13
  • [7] Active learning for efficient analysis of high-throughput nanopore data
    Guan, Xiaoyu
    Li, Zhongnian
    Zhou, Yueying
    Shao, Wei
    Zhang, Daoqiang
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [8] On Efficient Feature Ranking Methods for High-Throughput Data Analysis
    Liao, Bo
    Jiang, Yan
    Liang, Wei
    Peng, Lihong
    Peng, Li
    Hanyurwimfura, Damien
    Li, Zejun
    Chen, Min
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2015, 12 (06) : 1374 - 1384
  • [9] Efficient digest of high-throughput sequencing data in a reproducible report
    Zhang, Zhe
    Leipzig, Jeremy
    Sasson, Ariella
    Yu, Angela M.
    Perin, Juan C.
    Xie, Hongbo M.
    Sarmady, Mahdi
    Warren, Patrick V.
    White, Peter S.
    [J]. BMC BIOINFORMATICS, 2013, 14
  • [10] Efficient Bayesian inference for mechanistic modelling with high-throughput data
    Martina Perez, Simon A.
    Sailem, Heba
    Baker, Ruth E. A.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2022, 18 (06)