An Untargeted Metabolomics Workflow that Scales to Thousands of Samples for Population-Based Studies

被引:1
|
作者
Stancliffe, Ethan [1 ,2 ]
Schwaiger-Haber, Michaela [1 ,2 ]
Sindelar, Miriam [1 ,2 ]
Murphy, Matthew J. [1 ,2 ]
Soerensen, Mette [3 ]
Patti, Gary J. [1 ,2 ,4 ]
机构
[1] Washington Univ, Dept Chem, Dept Med, St Louis, MO 63130 USA
[2] Washington Univ, Ctr Metabol & Isotope Tracing, St Louis, MO 63130 USA
[3] Univ Southern Denmark, Dept Publ Hlth, Epidemiol Biostat & Biodemog, DK-5230 Odense, Denmark
[4] Washington Univ, Siteman Canc Ctr, St Louis, MO 63130 USA
基金
美国国家卫生研究院;
关键词
SPECTROMETRY DATA; ANNOTATION; ALIGNMENT; STRATEGY; DATABASE; XCMS; MILK; MS;
D O I
10.1021/acs.analchem.2c0127017370Anal
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
The success of precision medicine relies upon collecting data from many individuals at the population level. Although advancing technologies have made such large-scale studies increasingly feasible in some disciplines such as genomics, the standard workflows currently implemented in untargeted metabolomics were developed for small sample numbers and are limited by the processing of liquid chromatography/mass spectrometry data. Here we present an untargeted metabolomics workflow that is designed to support large-scale projects with thousands of biospecimens. Our strategy is to first evaluate a reference sample created by pooling aliquots of biospecimens from the cohort. The reference sample captures the chemical complexity of the biological matrix in a small number of analytical runs, which can subsequently be processed with conventional software such as XCMS. Although this generates thousands of so-called features, most do not correspond to unique compounds from the samples and can be filtered with established informatics tools. The features remaining represent a comprehensive set of biologically relevant reference chemicals that can then be extracted from the entire cohort's raw data on the basis of m/z values and retention times by using Skyline. To demonstrate applicability to large cohorts, we evaluated >2000 human plasma samples with our workflow. We focused our analysis on 360 identified compounds, but we also profiled >3000 unknowns from the plasma samples. As part of our workflow, we tested 14 different computational approaches for batch correction and found that a random forest-based approach outperformed the others. The corrected data revealed distinct profiles that were associated with the geographic location of participants.
引用
收藏
页码:17370 / 17378
页数:9
相关论文
共 50 条
  • [21] POPULATION-BASED STUDIES OF VARICELLA COMPLICATIONS
    GUESS, HA
    BROUGHTON, DD
    MELTON, LJ
    KURLAND, LT
    PEDIATRICS, 1986, 78 (04) : 723 - 727
  • [22] No evidence for anticipation in lymphoproliferative tumors in population-based samples
    Daugherty, SE
    Pfeiffer, RM
    Mellemkjaer, L
    Hemminki, K
    Goldin, LR
    CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2005, 14 (05) : 1245 - 1250
  • [23] Prevalence of xerostomia in population-based samples:: A systematic review
    Orellana, MF
    Lagravère, MO
    Boychuk, DGJ
    Major, PW
    Flores-Mir, C
    JOURNAL OF PUBLIC HEALTH DENTISTRY, 2006, 66 (02) : 152 - 158
  • [24] The multidimensionality of sleep in population-based samples: a narrative review
    van de Langenberg, Sterre C. N.
    Kocevska, Desana
    Luik, Annemarie I.
    JOURNAL OF SLEEP RESEARCH, 2022, 31 (04)
  • [25] Analysis of family- and population-based samples in cohort genome-wide association studies
    Ani Manichaikul
    Wei-Min Chen
    Kayleen Williams
    Quenna Wong
    Michèle M. Sale
    James S. Pankow
    Michael Y. Tsai
    Jerome I. Rotter
    Stephen S. Rich
    Josyf C. Mychaleckyj
    Human Genetics, 2012, 131 : 275 - 287
  • [26] Analysis of family- and population-based samples in cohort genome-wide association studies
    Manichaikul, Ani
    Chen, Wei-Min
    Williams, Kayleen
    Wong, Quenna
    Sale, Michele M.
    Pankow, James S.
    Tsai, Michael Y.
    Rotter, Jerome I.
    Rich, Stephen S.
    Mychaleckyj, Josyf C.
    HUMAN GENETICS, 2012, 131 (02) : 275 - 287
  • [27] A population-based urinary and plasma metabolomics study of environmental exposure to cadmium
    Ishibashi, Yoshiki
    Harada, Sei
    Eitaki, Yoko
    Kurihara, Ayako
    Kato, Suzuka
    Kuwabara, Kazuyo
    Iida, Miho
    Hirata, Aya
    Sata, Mizuki
    Matsumoto, Minako
    Shibuki, Takuma
    Okamura, Tomonori
    Sugiyama, Daisuke
    Sato, Asako
    Amano, Kaori
    Hirayama, Akiyoshi
    Sugimoto, Masahiro
    Soga, Tomoyoshi
    Tomita, Masaru
    Takebayashi, Toru
    ENVIRONMENTAL HEALTH AND PREVENTIVE MEDICINE, 2024, 29
  • [28] Methodologic and pragmatic issues in caregiver samples: Comparing ancillary, population-based, and registry-based studies.
    Fredman, L
    GERONTOLOGIST, 2001, 41 : 348 - 349
  • [29] Population-based comparison of the Cincinnati and Los Angeles Prehospital Stroke Scales
    Tirschwell, DL
    Schubert, GB
    Longstreth, WT
    Cobb, LA
    Copass, MK
    STROKE, 2003, 34 (01) : 267 - 267
  • [30] Measures of frailty in population-based studies: an overview
    Bouillon, Kim
    Kivimaki, Mika
    Hamer, Mark
    Sabia, Severine
    Fransson, Eleonor I.
    Singh-Manoux, Archana
    Gale, Catharine R.
    Batty, G. David
    BMC GERIATRICS, 2013, 13