Identifying Datasets for Cross-Study Analysis in dbGaP using PhenX

被引:2
|
作者
Pan, Huaqin [1 ]
Bakalov, Vesselina [1 ]
Cox, Lisa [1 ]
Engle, Michelle L. [1 ]
Erickson, Stephen W. [1 ]
Feolo, Michael [2 ]
Guo, Yuelong [3 ]
Huggins, Wayne [1 ]
Hwang, Stephen [1 ]
Kimura, Masato [2 ]
Krzyzanowski, Michelle [1 ]
Levy, Josh [4 ]
Phillips, Michael [1 ]
Qin, Ying [1 ]
Williams, David [1 ]
Ramos, Erin M. [5 ]
Hamilton, Carol M. [1 ]
机构
[1] RTI Int, Res Triangle Pk, NC 27709 USA
[2] NLM, Natl Ctr Biotechnol Informat, NIH, Bethesda, MD USA
[3] GeneCentr Therapeut Inc, Durham, NC USA
[4] Levy Informat, Chapel Hill, NC USA
[5] NHGRI, NIH, Bethesda, MD 20892 USA
基金
美国国家卫生研究院;
关键词
GENOTYPES; DATABASE; TOOLKIT; HARMONIZATION; DISEASE; LOCI;
D O I
10.1038/s41597-022-01660-4
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identifying relevant studies and harmonizing datasets are major hurdles for data reuse. Common Data Elements (CDEs) can help identify comparable study datasets and reduce the burden of retrospective data harmonization, but they have not been required, historically. The collaborative team at PhenX and dbGaP developed an approach to use PhenX variables as a set of CDEs to link phenotypic data and identify comparable studies in dbGaP. Variables were identified as either comparable or related, based on the data collection mode used to harmonize data across mapped datasets. We further added a CDE data field in the dbGaP data submission packet to indicate use of PhenX and annotate linkages in the future. Some 13,653 dbGaP variables from 521 studies were linked through PhenX variable mapping. These variable linkages have been made accessible for browsing and searching in the repository through dbGaP CDE-faceted search filter and the PhenX variable search tool. New features in dbGaP and PhenX enable investigators to identify variable linkages among dbGaP studies and reveal opportunities for cross-study analysis.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Identifying Datasets for Cross-Study Analysis in dbGaP using PhenX
    Huaqin Pan
    Vesselina Bakalov
    Lisa Cox
    Michelle L. Engle
    Stephen W. Erickson
    Michael Feolo
    Yuelong Guo
    Wayne Huggins
    Stephen Hwang
    Masato Kimura
    Michelle Krzyzanowski
    Josh Levy
    Michael Phillips
    Ying Qin
    David Williams
    Erin M. Ramos
    Carol M. Hamilton
    Scientific Data, 9
  • [2] Using PhenX Measures to Identify Opportunities for Cross-Study Analysis
    Pan, Huaqin
    Tryka, Kimberly A.
    Vreeman, Daniel J.
    Huggins, Wayne
    Phillips, Michael J.
    Mehta, Jayashri P.
    Phillips, Jacqueline H.
    McDonald, Clement J.
    Junkins, Heather A.
    Ramos, Erin M.
    Hamilton, Carol M.
    HUMAN MUTATION, 2012, 33 (05) : 849 - 857
  • [3] Cross-study Analysis of SEND Datasets Using an R Package: sendigR
    Snyder, K.
    Carfagna, M.
    Houser, W.
    Larsen, B.
    Paisley, B.
    Russo, D.
    Ali, Y.
    INTERNATIONAL JOURNAL OF TOXICOLOGY, 2022, 41 (01) : 52 - 53
  • [4] Integrated cross-study datasets of genetic dependencies in cancer
    Clare Pacini
    Joshua M. Dempster
    Isabella Boyle
    Emanuel Gonçalves
    Hanna Najgebauer
    Emre Karakoc
    Dieudonne van der Meer
    Andrew Barthorpe
    Howard Lightfoot
    Patricia Jaaks
    James M. McFarland
    Mathew J. Garnett
    Aviad Tsherniak
    Francesco Iorio
    Nature Communications, 12
  • [5] Integrated cross-study datasets of genetic dependencies in cancer
    Pacini, Clare
    Dempster, Joshua M.
    Boyle, Isabella
    Goncalves, Emanuel
    Najgebauer, Hanna
    Karakoc, Emre
    van der Meer, Dieudonne
    Barthorpe, Andrew
    Lightfoot, Howard
    Jaaks, Patricia
    McFarland, James M.
    Garnett, Mathew J.
    Tsherniak, Aviad
    Iorio, Francesco
    NATURE COMMUNICATIONS, 2021, 12 (01)
  • [6] A cross-study analysis of wearable datasets and the generalizability of acute illness monitoring models
    Kasl, Patrick
    Soltani, Severine
    Bruce, Lauryn Keeler
    Viswanath, Varun Kumar
    Hartogensis, Wendy
    Gupta, Amarnath
    Altintas, Ilkay
    Dilchert, Stephan
    Hecht, Frederick M.
    Mason, Ashley
    Smarr, Benjamin L.
    CONFERENCE ON HEALTH, INFERENCE, AND LEARNING, 2024, 248 : 644 - 682
  • [7] sendigR: an R package to leverage the value of CDISC SEND datasets for cross-study analysis
    Snyder, K.
    Ahmed, C. M. Sabbir
    Ali, Md Yousuf
    Butler, S.
    DeNieu, Michael
    Houser, W.
    Paisley, B.
    Rosentreter, M.
    Wang, W.
    Larsen, B.
    FRONTIERS IN TOXICOLOGY, 2024, 6
  • [8] Cross-Study Replicability in Cluster Analysis
    Masoero, Lorenzo
    Thomas, Emma
    Parmigiani, Giovanni
    Tyekucheva, Svitlana
    Trippa, Lorenzo
    STATISTICAL SCIENCE, 2023, 38 (02) : 303 - 316
  • [9] Cross-study comparison
    不详
    CONTROL ENGINEERING, 2008, 55 (12) : A14 - A15
  • [10] A Cross-Study Transcriptional Analysis of Parkinson's Disease
    Sutherland, Greg T.
    Matigian, Nicholas A.
    Chalk, Alistair M.
    Anderson, Matthew J.
    Silburn, Peter A.
    Mackay-Sim, Alan
    Wells, Christine A.
    Mellick, George D.
    PLOS ONE, 2009, 4 (03):