Ultrafast and scalable variant annotation and prioritization with big functional genomics data

被引:21
|
作者
Huang, Dandan [1 ,2 ,3 ]
Yi, Xianfu [4 ]
Zhou, Yao [2 ]
Yao, Hongcheng [5 ]
Xu, Hang [1 ,5 ]
Wang, Jianhua [2 ]
Zhang, Shijie [2 ]
Nong, Wenyan [6 ]
Wang, Panwen [7 ,8 ]
Shi, Lei [3 ]
Xuan, Chenghao [3 ]
Li, Miaoxin [9 ]
Wang, Junwen [7 ,8 ]
Li, Weidong [10 ]
Kwan, Hoi Shan [6 ]
Sham, Pak Chung [11 ]
Wang, Kai [12 ]
Li, Mulin Jun [1 ,2 ,13 ]
机构
[1] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Prov & Minist Cosponsored Collaborat Innovat Ctr, Natl Clin Res Ctr Canc, Tianjin 300070, Peoples R China
[2] Tianjin Med Univ, Sch Basic Med Sci, Dept Pharmacol, Tianjin Key Lab Inflammat Biol, Tianjin 300070, Peoples R China
[3] Tianjin Med Univ, Sch Basic Med Sci, Dept Biochem & Mol Biol, Tianjin 300070, Peoples R China
[4] Tianjin Med Univ, Sch Biomed Engn, Tianjin 300070, Peoples R China
[5] Univ Hong Kong, Sch Biomed Sci, LKS Fac Med, Hong Kong 999077, Peoples R China
[6] Chinese Univ Hong Kong, Sch Life Sci, Hong Kong 999077, Peoples R China
[7] Mayo Clin, Dept Hlth Sci Res, Scottsdale, AZ 85259 USA
[8] Mayo Clin, Ctr Individualized Med, Scottsdale, AZ 85259 USA
[9] Sun Yat Sen Univ, Zhongshan Sch Med, Affiliated Hosp 1, Ctr Genome Res,Ctr Precis Med, Guangzhou 510080, Peoples R China
[10] Tianjin Med Univ, Sch Basic Med Sci, Dept Genet, Tianjin 300070, Peoples R China
[11] Univ Hong Kong, LKS Fac Med, Dept Psychiat, Ctr Genom Sci, Hong Kong 999077, Peoples R China
[12] Childrens Hosp Philadelphia, Raymond G Perelman Ctr Cellular & Mol Therapeut, Philadelphia, PA 19104 USA
[13] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Tianjin Key Lab Mol Canc Epidemiol, Dept Epidemiol & Biostat, Tianjin 300070, Peoples R China
基金
中国国家自然科学基金;
关键词
DNA ELEMENTS; FRAMEWORK; IDENTIFICATION; ENCYCLOPEDIA; PREDICTION; MUTATIONS; DISCOVERY; LOCI;
D O I
10.1101/gr.267997.120
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
The advances of large-scale genomics studies have enabled compilation of cell type-specific, genome-wide DNA functional elements at high resolution. With the growing volume of functional annotation data and sequencing variants, existing variant annotation algorithms lack the efficiency and scalability to process big genomic data, particularly when annotating whole-genome sequencing variants against a huge database with billions of genomic features. Here, we develop VarNote to rapidly annotate genome-scale variants in large and complex functional annotation resources. Equipped with a novel index system and a parallel random-sweep searching algorithm, VarNote shows substantial performance improvements (two to three orders of magnitude) over existing algorithms at different scales. It supports both region-based and allele-specific annotations and introduces advanced functions for the flexible extraction of annotations. By integrating massive base-wise and context-dependent annotations in the VarNote framework, we introduce three efficient and accurate pipelines to prioritize the causal regulatory variants for common diseases, Mendelian disorders, and cancers.
引用
收藏
页码:1789 / 1801
页数:14
相关论文
共 50 条
  • [1] HitWalker: variant prioritization for personalized functional cancer genomics
    Bottomly, Daniel
    Wilmot, Beth
    Tyner, Jeffrey W.
    Eide, Christopher A.
    Loriaux, Marc M.
    Druker, Brian J.
    McWeeney, Shannon K.
    BIOINFORMATICS, 2013, 29 (04) : 509 - 510
  • [2] A new approach (EDIZ) for big data variant prioritization
    Ergun, Mehmet Ali
    Ergun, Sezen Guntekin
    Percin, E. Ferda
    NETWORK MODELING AND ANALYSIS IN HEALTH INFORMATICS AND BIOINFORMATICS, 2019, 8 (01):
  • [3] A new approach (EDIZ) for big data variant prioritization
    Mehmet Ali Ergun
    Sezen Guntekin Ergun
    E. Ferda Percin
    Network Modeling Analysis in Health Informatics and Bioinformatics, 2019, 8
  • [4] Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR
    Yang, Hui
    Wang, Kai
    NATURE PROTOCOLS, 2015, 10 (10) : 1556 - 1566
  • [5] Application for genomic variant annotation, filtering and prioritization
    Hekel, R.
    Budis, J.
    Turna, J.
    Szemes, T.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2018, 26 : 991 - 991
  • [6] Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR
    Hui Yang
    Kai Wang
    Nature Protocols, 2015, 10 : 1556 - 1566
  • [7] Scalable processing and autocovariance computation of big functional data
    Brisaboa, Nieves R.
    Cao, Ricardo
    Parama, Jose R.
    Silva-Coira, Fernando
    SOFTWARE-PRACTICE & EXPERIENCE, 2018, 48 (01): : 123 - 140
  • [8] Scalable Functional Dependencies Discovery from Big Data
    Tu Shouzhong
    Huang Minlie
    2016 IEEE SECOND INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2016, : 426 - 431
  • [9] Functional genomics annotation: It's logical!
    Anderson, MW
    SCIENTIST, 2005, 19 (05): : 33 - 33
  • [10] mirTarPri: Improved Prioritization of MicroRNA Targets through Incorporation of Functional Genomics Data
    Wang, Peng
    Ning, Shangwei
    Wang, Qianghu
    Li, Ronghong
    Ye, Jingrun
    Zhao, Zuxianglan
    Li, Yan
    Huang, Teng
    Li, Xia
    PLOS ONE, 2013, 8 (01):