Gaussian Mixture Modeling Extensions for Improved False Discovery Rate Estimation in GC-MS Metabolomics

被引:1
|
作者
Flores, Javier E. [1 ]
Bramer, Lisa M. [1 ]
Degnan, David J. [1 ]
Paurus, Vanessa L. [2 ]
Corilo, Yuri E. [2 ]
Clendinen, Chaevien S. [2 ]
机构
[1] Pacific Northwest Natl Lab, Biol Sci Div, Richland, WA 99354 USA
[2] Pacific Northwest Natl Lab, Environm Mol Sci Div, Richland, WA 99354 USA
关键词
metabolite identification; spectral similarity score; false positive rate; IDENTIFICATION;
D O I
10.1021/jasms.3c00039
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The ability to reliably identify small molecules (e.g., metabolites) is key toward driving scientific advancement in metabolomics. Gas chromatography-mass spectrometry (GC-MS) is an analytic method that may be applied to facilitate this process. The typical GC-MS identification workflow involves quantifying the similarity of an observed sample spectrum and other features (e.g., retention index) to that of several references, noting the compound of the best-matching reference spectrum as the identified metabolite. While a deluge of similarity metrics exist, none quantify the error rate of generated identifications, thereby presenting an unknown risk of false identification or discovery. To quantify this unknown risk, we propose a model-based framework for estimating the false discovery rate (FDR) among a set of identifications. Extending a traditional mixture modeling framework, our method incorporates both similarity score and experimental information in estimating the FDR. We apply these models to identification lists derived from across 548 samples of varying complexity and sample type (e.g., fungal species, standard mixtures, etc.), comparing their performance to that of the traditional Gaussian mixture model (GMM). Through simulation, we additionally assess the impact of reference library size on the accuracy of FDR estimates. In comparing the best performing model extensions to the GMM, our results indicate relative decreases in median absolute estimation error (MAE) ranging from 12% to 70%, based on comparisons of the median MAEs across all hit-lists. Results indicate that these relative performance improvements generally hold despite library size; however FDR estimation error typically worsens as the set of reference compounds diminishes.
引用
收藏
页码:1096 / 1104
页数:9
相关论文
共 25 条
  • [21] Modeling and optimization of chlorpyrifos and glyphosate biodegradation using RSM and ANN: Elucidating their degradation pathways by GC-MS based metabolomics
    Malla, Muneer Ahmad
    Dubey, Anamika
    Kumar, Ashwani
    Yadav, Shweta
    Kumari, Sheena
    ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY, 2023, 252
  • [22] New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics
    Peng, Yisu
    Jain, Shantanu
    Li, Yong Fuga
    Gregus, Michal
    Ivanov, Alexander R.
    Vitek, Olga
    Radivojac, Predrag
    BIOINFORMATICS, 2020, 36 : I745 - I753
  • [23] LC-MS and GC-MS Metabolomics Analyses Revealed That Different Exogenous Substances Improved the Quality of Blueberry Fruits under Soil Cadmium Toxicity
    Yang, Hao
    Wu, Yaqiong
    Che, Jilu
    Wu, Wenlong
    Lyu, Lianfei
    Li, Weilin
    JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2023, 72 (01) : 904 - 915
  • [24] Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics (vol 14, pg 3148, 2015)
    Keich, Uri
    Kertesz-Farkas, Attila
    Noble, William Stafford
    JOURNAL OF PROTEOME RESEARCH, 2016, 15 (12) : 4779 - 4780
  • [25] Improved Estimation of the Noncentrality Parameter Distribution from a Large Number of t-Statistics, with Applications to False Discovery Rate Estimation in Microarray Data Analysis
    Qu, Long
    Nettleton, Dan
    Dekkers, Jack C. M.
    BIOMETRICS, 2012, 68 (04) : 1178 - 1187