Gaussian Mixture Modeling Extensions for Improved False Discovery Rate Estimation in GC-MS Metabolomics

被引:1
|
作者
Flores, Javier E. [1 ]
Bramer, Lisa M. [1 ]
Degnan, David J. [1 ]
Paurus, Vanessa L. [2 ]
Corilo, Yuri E. [2 ]
Clendinen, Chaevien S. [2 ]
机构
[1] Pacific Northwest Natl Lab, Biol Sci Div, Richland, WA 99354 USA
[2] Pacific Northwest Natl Lab, Environm Mol Sci Div, Richland, WA 99354 USA
关键词
metabolite identification; spectral similarity score; false positive rate; IDENTIFICATION;
D O I
10.1021/jasms.3c00039
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The ability to reliably identify small molecules (e.g., metabolites) is key toward driving scientific advancement in metabolomics. Gas chromatography-mass spectrometry (GC-MS) is an analytic method that may be applied to facilitate this process. The typical GC-MS identification workflow involves quantifying the similarity of an observed sample spectrum and other features (e.g., retention index) to that of several references, noting the compound of the best-matching reference spectrum as the identified metabolite. While a deluge of similarity metrics exist, none quantify the error rate of generated identifications, thereby presenting an unknown risk of false identification or discovery. To quantify this unknown risk, we propose a model-based framework for estimating the false discovery rate (FDR) among a set of identifications. Extending a traditional mixture modeling framework, our method incorporates both similarity score and experimental information in estimating the FDR. We apply these models to identification lists derived from across 548 samples of varying complexity and sample type (e.g., fungal species, standard mixtures, etc.), comparing their performance to that of the traditional Gaussian mixture model (GMM). Through simulation, we additionally assess the impact of reference library size on the accuracy of FDR estimates. In comparing the best performing model extensions to the GMM, our results indicate relative decreases in median absolute estimation error (MAE) ranging from 12% to 70%, based on comparisons of the median MAEs across all hit-lists. Results indicate that these relative performance improvements generally hold despite library size; however FDR estimation error typically worsens as the set of reference compounds diminishes.
引用
收藏
页码:1096 / 1104
页数:9
相关论文
共 25 条
  • [2] Local false discovery rate estimation using feature reliability in LC/MS metabolomics data
    Chong, Elizabeth Y.
    Huang, Yijian
    Wu, Hao
    Ghasemzadeh, Nima
    Uppal, Karan
    Quyyumi, Arshed A.
    Jones, Dean P.
    Yu, Tianwei
    SCIENTIFIC REPORTS, 2015, 5
  • [3] Local false discovery rate estimation using feature reliability in LC/MS metabolomics data
    Elizabeth Y. Chong
    Yijian Huang
    Hao Wu
    Nima Ghasemzadeh
    Karan Uppal
    Arshed A. Quyyumi
    Dean P. Jones
    Tianwei Yu
    Scientific Reports, 5
  • [4] GAUSSIAN GRAPHICAL MODEL ESTIMATION WITH FALSE DISCOVERY RATE CONTROL
    Liu, Weidong
    ANNALS OF STATISTICS, 2013, 41 (06): : 2948 - 2978
  • [5] Discovery of false identification using similarity difference in GC-MS-based metabolomics
    Kim, Seongho
    Zhang, Xiang
    JOURNAL OF CHEMOMETRICS, 2015, 29 (02) : 80 - 86
  • [6] Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics
    Keich, Uri
    Kertesz-Farkas, Attila
    Noble, William Stafford
    JOURNAL OF PROTEOME RESEARCH, 2015, 14 (08) : 3148 - 3161
  • [7] Infinite Gaussian Mixture Modeling with an Improved Estimation of the Number of Clusters
    Matza, Avi
    Bistritz, Yuval
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8921 - 8929
  • [8] AN EMPIRICAL BAYES MIXTURE METHOD FOR EFFECT SIZE AND FALSE DISCOVERY RATE ESTIMATION
    Muralidharan, Omkar
    ANNALS OF APPLIED STATISTICS, 2010, 4 (01): : 422 - 438
  • [9] Enhanced false discovery rate using Gaussian mixture models for thresholding fMRI statistical maps
    Pendse, Gautam
    Borsook, David
    Becerra, Lino
    NEUROIMAGE, 2009, 47 (01) : 231 - 261
  • [10] Covariate-adjusted Gaussian graphical model estimation with false discovery rate control
    Zhu, Yunlong
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2022, 51 (04) : 974 - 993