Gaussian Mixture Modeling Extensions for Improved False Discovery Rate Estimation in GC-MS Metabolomics

被引：1

作者：

Flores, Javier E. ^{[1
]}

Bramer, Lisa M. ^{[1
]}

Degnan, David J. ^{[1
]}

Paurus, Vanessa L. ^{[2
]}

Corilo, Yuri E. ^{[2
]}

Clendinen, Chaevien S. ^{[2
]}

机构：

[1] Pacific Northwest Natl Lab, Biol Sci Div, Richland, WA 99354 USA

[2] Pacific Northwest Natl Lab, Environm Mol Sci Div, Richland, WA 99354 USA

来源：

JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY | 2023年 / 34卷 / 06期

关键词：

metabolite identification; spectral similarity score; false positive rate; IDENTIFICATION;

D O I：

10.1021/jasms.3c00039

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

The ability to reliably identify small molecules (e.g., metabolites) is key toward driving scientific advancement in metabolomics. Gas chromatography-mass spectrometry (GC-MS) is an analytic method that may be applied to facilitate this process. The typical GC-MS identification workflow involves quantifying the similarity of an observed sample spectrum and other features (e.g., retention index) to that of several references, noting the compound of the best-matching reference spectrum as the identified metabolite. While a deluge of similarity metrics exist, none quantify the error rate of generated identifications, thereby presenting an unknown risk of false identification or discovery. To quantify this unknown risk, we propose a model-based framework for estimating the false discovery rate (FDR) among a set of identifications. Extending a traditional mixture modeling framework, our method incorporates both similarity score and experimental information in estimating the FDR. We apply these models to identification lists derived from across 548 samples of varying complexity and sample type (e.g., fungal species, standard mixtures, etc.), comparing their performance to that of the traditional Gaussian mixture model (GMM). Through simulation, we additionally assess the impact of reference library size on the accuracy of FDR estimates. In comparing the best performing model extensions to the GMM, our results indicate relative decreases in median absolute estimation error (MAE) ranging from 12% to 70%, based on comparisons of the median MAEs across all hit-lists. Results indicate that these relative performance improvements generally hold despite library size; however FDR estimation error typically worsens as the set of reference compounds diminishes.

引用

页码：1096 / 1104

页数：9

共 25 条

[21] Modeling and optimization of chlorpyrifos and glyphosate biodegradation using RSM and ANN: Elucidating their degradation pathways by GC-MS based metabolomics
Malla, Muneer Ahmad
Dubey, Anamika
Kumar, Ashwani
Yadav, Shweta
Kumari, Sheena
ECOTOXICOLOGY AND ENVIRONMENTAL SAFETY, 2023, 252
[22] New mixture models for decoy-free false discovery rate estimation in mass spectrometry proteomics
Peng, Yisu
Jain, Shantanu
Li, Yong Fuga
Gregus, Michal
Ivanov, Alexander R.
Vitek, Olga
Radivojac, Predrag
BIOINFORMATICS, 2020, 36 : I745 - I753
[23] LC-MS and GC-MS Metabolomics Analyses Revealed That Different Exogenous Substances Improved the Quality of Blueberry Fruits under Soil Cadmium Toxicity
Yang, Hao
Wu, Yaqiong
Che, Jilu
Wu, Wenlong
Lyu, Lianfei
Li, Weilin
JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY, 2023, 72 (01) : 904 - 915
[24] Improved False Discovery Rate Estimation Procedure for Shotgun Proteomics (vol 14, pg 3148, 2015)
Keich, Uri
Kertesz-Farkas, Attila
Noble, William Stafford
JOURNAL OF PROTEOME RESEARCH, 2016, 15 (12) : 4779 - 4780
[25] Improved Estimation of the Noncentrality Parameter Distribution from a Large Number of t-Statistics, with Applications to False Discovery Rate Estimation in Microarray Data Analysis
Qu, Long
Nettleton, Dan
Dekkers, Jack C. M.
BIOMETRICS, 2012, 68 (04) : 1178 - 1187

← 1 2 3 →