Genotype harmonizer: Automatic strand alignment and format conversion for genotype data integration

被引:88
|
作者
Deelen P. [1 ,2 ]
Bonder M.J. [2 ]
Van Der Velde K.J. [1 ,2 ]
Westra H.-J. [2 ]
Winder E. [1 ,2 ]
Hendriksen D. [1 ,2 ]
Franke L. [2 ]
Swertz M.A. [1 ,2 ]
机构
[1] University of Groningen, University Medical Center Groningen, Genomics Coordination Center, Groningen
[2] University of Groningen, University Medical Center Groningen, Department of Genetics, Groningen
关键词
GWAS; Imputation; Linkage disequilibrium; Meta-analysis;
D O I
10.1186/1756-0500-7-901
中图分类号
学科分类号
摘要
Background: To gain statistical power or to allow fine mapping, researchers typically want to pool data before meta-analyses or genotype imputation. However, the necessary harmonization of genetic datasets is currently error-prone because of many different file formats and lack of clarity about which genomic strand is used as reference. Findings: Genotype Harmonizer (GH) is a command-line tool to harmonize genetic datasets by automatically solving issues concerning genomic strand and file format. GH solves the unknown strand issue by aligning ambiguous A/T and G/C SNPs to a specified reference, using linkage disequilibrium patterns without prior knowledge of the used strands. GH supports many common GWAS/NGS genotype formats including PLINK, binary PLINK, VCF, SHAPEIT2 & Oxford GEN. GH is implemented in Java and a large part of the functionality can also be used as Java 'Genotype-IO' API. All software is open source under license LGPLv3 and available from www.molgenis.org/systemsgenetics. Conclusions: GH can be used to harmonize genetic datasets across different file formats and can be easily integrated as a step in routine meta-analysis and imputation pipelines. © 2014 Deelen et al.; licensee BioMed Central.
引用
收藏
相关论文
共 26 条
  • [1] An XML-based interchange format for genotype-phenotype data
    Whirl-Carrillo, M.
    Woon, M.
    Thorn, C. E.
    Klein, T. E.
    Altman, R. B.
    HUMAN MUTATION, 2008, 29 (02) : 212 - 219
  • [2] Incremental data integration for tracking genotype-disease associations
    Konopka, Tomasz
    Smedley, Damian
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (01)
  • [3] Automatic Conversion of PSCAD Data into Electromagnetic Simulation Program (ESP) Format
    Zhao, Ligang
    Wang, Changxiang
    Hong, Chao
    Tu, Lang
    Zhou, Tinghui
    Yang, Cheng
    Jing, Chaoyang
    Jones, Chris
    2016 IEEE GREEN ENERGY AND SYSTEMS CONFERENCE (IGSEC), 2016,
  • [4] Semi-automatic ontology alignment for geospatial data integration
    Cruz, IF
    Sunna, W
    Chaudhry, A
    GEOGRAPHIC INFORMATION SCIENCE, PROCEEDINGS, 2004, 3234 : 51 - 66
  • [5] Genotype and phenotype data standardization, utilization and integration in the big data era for agricultural sciences
    Deng, Cecilia H.
    Naithani, Sushma
    Kumari, Sunita
    Cobo-Simon, Irene
    Quezada-Rodriguez, Elsa H.
    Skrabisova, Maria
    Gladman, Nick
    Correll, Melanie J.
    Sikiru, Akeem Babatunde
    Afuwape, Olusola O.
    Marrano, Annarita
    Rebollo, Ines
    Zhang, Wentao
    Jung, Sook
    DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2023, 2023
  • [6] Multimodal Genotype and Phenotype Data Integration to Improve Partial Data-Based Longitudinal Prediction
    Ganjdanesh, Alireza
    Zhang, Jipeng
    Yan, Sarah
    Chen, Wei
    Huang, Heng
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2022, 29 (12) : 1324 - 1345
  • [7] Physiological Pathways tool allows intuitive linking of genotype and phenotype data to biological processes in a graphical format
    Munzenmaier, Diane H.
    Liu, Weisong
    Jacob, Howard J.
    FASEB JOURNAL, 2012, 26
  • [8] StructHDP: automatic inference of number of clusters and population structure from admixed genotype data
    Shringarpure, Suyash
    Won, Daegun
    Xing, Eric P.
    BIOINFORMATICS, 2011, 27 (13) : I324 - I332
  • [9] Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork
    Druka, Arnis
    Druka, Ilze
    Centeno, Arthur G.
    Li, Hongqiang
    Sun, Zhaohui
    Thomas, William T. B.
    Bonar, Nicola
    Steffenson, Brian J.
    Ullrich, Steven E.
    Kleinhofs, Andris
    Wise, Roger P.
    Close, Timothy J.
    Potokina, Elena
    Luo, Zewei
    Wagner, Carola
    Schweizer, Guenther F.
    Marshall, David F.
    Kearsey, Michael J.
    Williams, Robert W.
    Waugh, Robbie
    BMC GENETICS, 2008, 9 (1)
  • [10] Towards systems genetic analyses in barley: Integration of phenotypic, expression and genotype data into GeneNetwork
    Arnis Druka
    Ilze Druka
    Arthur G Centeno
    Hongqiang Li
    Zhaohui Sun
    William TB Thomas
    Nicola Bonar
    Brian J Steffenson
    Steven E Ullrich
    Andris Kleinhofs
    Roger P Wise
    Timothy J Close
    Elena Potokina
    Zewei Luo
    Carola Wagner
    Günther F Schweizer
    David F Marshall
    Michael J Kearsey
    Robert W Williams
    Robbie Waugh
    BMC Genetics, 9