xMSanalyzer: automated pipeline for improved feature detection and downstream analysis of large-scale, non-targeted metabolomics data

被引:274
|
作者
Uppal, Karan [1 ,6 ]
Soltow, Quinlyn A. [2 ]
Strobel, Frederick H. [3 ]
Pittard, W. Stephen [1 ]
Gernert, Kim M. [1 ]
Yu, Tianwei [4 ]
Jones, Dean P. [2 ,5 ]
机构
[1] Emory Univ, Sch Med, BimCore, Atlanta, GA USA
[2] Emory Univ, Dept Med, Div Pulm Allergy & Crit Care, Atlanta, GA 30322 USA
[3] Emory Univ, Mass Spectrometry Ctr, Atlanta, GA 30322 USA
[4] Emory Univ, Rollins Sch Publ Hlth, Dept Biostat & Bioinformat, Atlanta, GA 30322 USA
[5] Emory Univ, Clin Biomarkers Lab, Atlanta, GA 30322 USA
[6] Georgia Inst Technol, Sch Biol, Atlanta, GA 30332 USA
来源
BMC BIOINFORMATICS | 2013年 / 14卷
基金
美国国家卫生研究院;
关键词
OPEN-SOURCE SOFTWARE; MASS; ALIGNMENT; ALGORITHMS; FRAMEWORK; OPENMS; MZMINE; SUITE;
D O I
10.1186/1471-2105-14-15
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Detection of low abundance metabolites is important for de novo mapping of metabolic pathways related to diet, microbiome or environmental exposures. Multiple algorithms are available to extract m/z features from liquid chromatography-mass spectral data in a conservative manner, which tends to preclude detection of low abundance chemicals and chemicals found in small subsets of samples. The present study provides software to enhance such algorithms for feature detection, quality assessment, and annotation. Results: xMSanalyzer is a set of utilities for automated processing of metabolomics data. The utilites can be classified into four main modules to: 1) improve feature detection for replicate analyses by systematic re-extraction with multiple parameter settings and data merger to optimize the balance between sensitivity and reliability, 2) evaluate sample quality and feature consistency, 3) detect feature overlap between datasets, and 4) characterize high-resolution m/z matches to small molecule metabolites and biological pathways using multiple chemical databases. The package was tested with plasma samples and shown to more than double the number of features extracted while improving quantitative reliability of detection. MS/MS analysis of a random subset of peaks that were exclusively detected using xMSanalyzer confirmed that the optimization scheme improves detection of real metabolites. Conclusions: xMSanalyzer is a package of utilities for data extraction, quality control assessment, detection of overlapping and unique metabolites in multiple datasets, and batch annotation of metabolites. The program was designed to integrate with existing packages such as apLCMS and XCMS, but the framework can also be used to enhance data extraction for other LC/MS data software.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] A large-scale analysis of targeted metabolomics data from heterogeneous biological samples provides insights into metabolite dynamics
    Ho-Joon Lee
    Daniel M. Kremer
    Peter Sajjakulnukit
    Li Zhang
    Costas A. Lyssiotis
    Metabolomics, 2019, 15
  • [22] A large-scale analysis of targeted metabolomics data from heterogeneous biological samples provides insights into metabolite dynamics
    Lee, Ho-Joon
    Kremer, Daniel M.
    Sajjakulnukit, Peter
    Zhang, Li
    Lyssiotis, Costas A.
    METABOLOMICS, 2019, 15 (07)
  • [23] MetHoS: a platform for large-scale processing, storage and analysis of metabolomics data
    Konstantinos Tzanakis
    Tim W. Nattkemper
    Karsten Niehaus
    Stefan P. Albaum
    BMC Bioinformatics, 23
  • [24] Automated Parallel Data Processing Engine with Application to Large-scale Feature Extraction
    Xing, Xin
    Dong, Bin
    Ajo-Franklin, Jonathan
    Wu, Kesheng
    PROCEEDINGS OF 2018 IEEE/ACM MACHINE LEARNING IN HPC ENVIRONMENTS (MLHPC 2018), 2018, : 37 - 46
  • [25] MetHoS: a platform for large-scale processing, storage and analysis of metabolomics data
    Tzanakis, Konstantinos
    Nattkemper, Tim W.
    Niehaus, Karsten
    Albaum, Stefan P.
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [26] Automated pipeline framework for processing of large-scale building energy time series data
    Khalilnejad, Arash
    Karimi, Ahmad M.
    Kamath, Shreyas
    Haddadian, Rojiar
    French, Roger H.
    Abramson, Alexis R.
    PLOS ONE, 2020, 15 (12):
  • [27] Big data analytics: an improved method for large-scale fabrics detection based on feature importance analysis from cascaded representation
    Wu, Ming-Hu
    Cai, Song
    Zeng, Chun-Yan
    Wang, Zhi-Feng
    Zhao, Nan
    Zhu, Li
    Wang, Juan
    INTERNATIONAL JOURNAL OF GRID AND UTILITY COMPUTING, 2021, 12 (01) : 81 - 93
  • [28] Large-scale microarray data based feature selection for improved molecular classification
    Lu, Liangqun
    Daigle, Bernie J., Jr.
    BMC BIOINFORMATICS, 2017, 18
  • [29] An improved Generalized Discriminant Analysis for Large-scale data set
    Shi, Weiya
    Guo, Yue-Fei
    Jin, Cheng
    Xue, Xiangyang
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 769 - 772
  • [30] Towards Automated Log Parsing for Large-Scale Log Data Analysis
    He, Pinjia
    Zhu, Jieming
    He, Shilin
    Li, Jian
    Lyu, Michael R.
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2018, 15 (06) : 931 - 944