Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data

被引:7
|
作者
Eller, Ryan J. [1 ]
Janga, Sarath C. [2 ,3 ]
Walsh, Susan [1 ]
机构
[1] Indiana Univ Purdue Univ, Dept Biol, 723 W Michigan St, Indianapolis, IN 46205 USA
[2] Indiana Univ Sch Med, Ctr Computat Biol & Bioinformat, 5021 Hlth Informat & Translat Sci HITS, Indianapolis, IN 46202 USA
[3] Indiana Univ Purdue Univ, Sch Informat & Comp, Dept Biohlth Informat, Indianapolis, IN 46202 USA
基金
美国国家科学基金会;
关键词
Imputation; Phasing; Pipeline; Genome-wide-association study; Admixture; Odyssey; MIXED-MODEL ANALYSIS; GENOTYPE IMPUTATION; ASSOCIATION; HAPLOTYPES; SEQUENCE;
D O I
10.1186/s12859-019-2964-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundGenome imputation, admixture resolution and genome-wide association analyses are timely and computationally intensive processes with many composite and requisite steps. Analysis time increases further when building and installing the run programs required for these analyses. For scientists that may not be as versed in programing language, but want to perform these operations hands on, there is a lengthy learning curve to utilize the vast number of programs available for these analyses.ResultsIn an effort to streamline the entire process with easy-to-use steps for scientists working with big data, the Odyssey pipeline was developed. Odyssey is a simplified, efficient, semi-automated genome-wide imputation and analysis pipeline, which prepares raw genetic data, performs pre-imputation quality control, phasing, imputation, post-imputation quality control, population stratification analysis, and genome-wide association with statistical data analysis, including result visualization. Odyssey is a pipeline that integrates programs such as PLINK, SHAPEIT, Eagle, IMPUTE, Minimac, and several R packages, to create a seamless, easy-to-use, and modular workflow controlled via a single user-friendly configuration file. Odyssey was built with compatibility in mind, and thus utilizes the Singularity container solution, which can be run on Linux, MacOS, and Windows platforms. It is also easily scalable from a simple desktop to a High-Performance System (HPS).ConclusionOdyssey facilitates efficient and fast genome-wide association analysis automation and can go from raw genetic data to genome: phenome association visualization and analyses results in 3-8h on average, depending on the input data, choice of programs within the pipeline and available computer resources. Odyssey was built to be flexible, portable, compatible, scalable, and easy to setup. Biologists less familiar with programing can now work hands on with their own big data using this easy-to-use pipeline.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Odyssey: a semi-automated pipeline for phasing, imputation, and analysis of genome-wide genetic data
    Ryan J. Eller
    Sarath C. Janga
    Susan Walsh
    [J]. BMC Bioinformatics, 20
  • [2] Genome-Wide Semi-Automated Annotation of Transporter Systems
    Dias, Oscar
    Gomes, Daniel
    Vilaca, Paulo
    Cardoso, Joao
    Rocha, Miguel
    Ferreira, Eugenio C.
    Rocha, Isabel
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2017, 14 (02) : 443 - 456
  • [3] genipe: an automated genome-wide imputation pipeline with automatic reporting and statistical tools
    Perreault, Louis-Philippe Lemieux
    Legault, Marc-Andre
    Asselin, Geraldine
    Dube, Marie-Pierre
    [J]. BIOINFORMATICS, 2016, 32 (23) : 3661 - 3663
  • [4] EagleImp: fast and accurate genome-wide phasing and imputation in a single tool
    Wienbrandt, Lars
    Ellinghaus, David
    [J]. BIOINFORMATICS, 2022, 38 (22) : 4999 - 5006
  • [5] Accurate genome-wide phasing from IBD data
    Noto, Keith
    Ruiz, Luong
    [J]. BMC BIOINFORMATICS, 2022, 23 (01)
  • [6] Accurate genome-wide phasing from IBD data
    Keith Noto
    Luong Ruiz
    [J]. BMC Bioinformatics, 23
  • [7] An Analysis Pipeline for Genome-wide Association Studies
    Stefanov, Stefan
    Lautenberger, James
    Gold, Bert
    [J]. CANCER INFORMATICS, 2008, 6 : 455 - +
  • [8] Fast and accurate genotype imputation in genome-wide association studies through pre-phasing
    Bryan Howie
    Christian Fuchsberger
    Matthew Stephens
    Jonathan Marchini
    Gonçalo R Abecasis
    [J]. Nature Genetics, 2012, 44 : 955 - 959
  • [9] Fast and accurate genotype imputation in genome-wide association studies through pre-phasing
    Howie, Bryan
    Fuchsberger, Christian
    Stephens, Matthew
    Marchini, Jonathan
    Abecasis, Goncalo R.
    [J]. NATURE GENETICS, 2012, 44 (08) : 955 - +
  • [10] A GOMSL Analysis of Semi-Automated Data Entry
    Haimson, Craig
    Grossman, Justin
    [J]. EICS'09: PROCEEDINGS OF THE ACM SIGCHI SYMPOSIUM ON ENGINEERING INTERACTIVE COMPUTING SYSTEMS, 2009, : 61 - 65