GRIEVOUS: your command-line general for resolving cross-dataset genotype inconsistencies

被引:0
|
作者
Talwar, James, V [1 ,2 ]
Klie, Adam [1 ,2 ]
Pagadala, Meghana S. [3 ]
Carter, Hannah [1 ,2 ,4 ]
机构
[1] Univ Calif San Diego, Dept Med, Div Med Genet, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Bioinformat & Syst Biol Program, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Biomed Sci Program, La Jolla, CA 92093 USA
[4] Univ Calif San Diego, Moores Canc Ctr, La Jolla, CA 92093 USA
基金
美国国家卫生研究院;
关键词
QUALITY-CONTROL; GENOME; METAANALYSIS; ASSOCIATION; IMPUTATION;
D O I
10.1093/bioinformatics/btae489
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Harmonizing variant indexing and allele assignments across datasets is crucial for data integrity in cross-dataset studies such as multi-cohort genome-wide association studies, meta-analyses, and the development, validation, and application of polygenic risk scores. Ensuring this indexing and allele consistency is a laborious, time-consuming, and error-prone process requiring a certain degree of computational proficiency. Here, we introduce GRIEVOUS, a command-line tool for cross-dataset variant homogenization. By means of an internal database and a custom indexing methodology, GRIEVOUS identifies, formats, and aligns all biallelic single nucleotide polymorphisms (SNPs) across all summary statistic and genotype files of interest. Upon completion of dataset harmonization, GRIEVOUS can also be used to extract the maximal set of biallelic SNPs common to all datasets. Availability and implementation: GRIEVOUS and all supporting documentation and tutorials can be found at https://github.com/jvtalwar/GRIEVOUS. It is freely and publicly available under the MIT license and can be installed via pip.
引用
收藏
页数:5
相关论文
共 5 条
  • [1] glactools: a command-line toolset for the management of genotype likelihoods and allele counts
    Renaud, Gabriel
    BIOINFORMATICS, 2018, 34 (08) : 1398 - 1400
  • [2] Command-line Cross-matching Tool for Modern Astrophysical Pipelines
    Riccio, Giuseppe
    Brescia, Massimo
    Cavuoti, Stefano
    Mercurio, Amata
    Di Giorgio, Anna Maria
    Molinari, Sergio
    ASTRONOMICAL DATA ANALYSIS SOFTWARE AND SYSTEMS XXVI, 2019, 521 : 390 - 393
  • [3] Evaluation of Genotype-Based Gene Expression Model Performance: A Cross-Framework and Cross-Dataset Study
    Tavares, Vania
    Monteiro, Joana
    Vassos, Evangelos
    Coleman, Jonathan
    Prata, Diana
    GENES, 2021, 12 (10)
  • [4] C3, A Command-line Catalog Cross-match Tool for Large Astrophysical Catalogs
    Riccio, Giuseppe
    Brescia, Massimo
    Cavuoti, Stefano
    Mercurio, Amata
    di Giorgio, Anna Maria
    Molinari, Sergio
    PUBLICATIONS OF THE ASTRONOMICAL SOCIETY OF THE PACIFIC, 2017, 129 (972)
  • [5] C3: A Command-line Catalogue Cross-matching tool for modern astrophysical survey data
    Riccio, Giuseppe
    Brescia, Massimo
    Cavuoti, Stefano
    Mercurio, Amata
    Di Giorgio, Anna Maria
    Molinari, Sergio
    ASTROINFORMATICS, 2017, 12 (S325): : 327 - 332