FILER: a framework for harmonizing and querying large-scale functional genomics knowledge

被引:4
|
作者
Kuksa, Pavel P. [1 ,2 ]
Leung, Yuk Yee [1 ,2 ]
Gangadharan, Prabhakaran [1 ,2 ]
Katanic, Zivadin [1 ,2 ]
Kleidermacher, Lauren [3 ]
Amlie-Wolf, Alexandre [1 ,2 ]
Lee, Chien-Yueh [1 ,2 ]
Qu, Liming [1 ,2 ]
Greenfest-Allen, Emily [1 ,4 ]
Valladares, Otto [1 ,2 ]
Wang, Li-San [1 ,2 ]
机构
[1] Univ Penn, Penn Neurodegenerat Genom Ctr, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
[2] Univ Penn, Perelman Sch Med, Dept Pathol & Lab Med, Philadelphia, PA 19104 USA
[3] Univ Penn, Coll Arts & Sci, Dept Biol, Philadelphia, PA 19104 USA
[4] Univ Penn, Perelman Sch Med, Dept Genet, Philadelphia, PA 19104 USA
关键词
DNA ELEMENTS; ENCODE; ENCYCLOPEDIA; DATABASE;
D O I
10.1093/nargab/lqab123
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user's experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 x 10(9) hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Large-scale mutagenesis and functional genomics in yeast
    Que Q.Q.
    Winzeler E.A.
    Functional & Integrative Genomics, 2002, 2 (4-5) : 193 - 198
  • [2] Querying large-scale knowledge graphs using Qualitative Spatial Reasoning
    Mantle, Matthew
    Batsakis, Sotirios
    Antoniou, Grigoris
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258
  • [4] Plant metabolomics: large-scale phytochemistry in the functional genomics era
    Sumner, LW
    Mendes, P
    Dixon, RA
    PHYTOCHEMISTRY, 2003, 62 (06) : 817 - 836
  • [5] The Genomedata format for storing large-scale functional genomics data
    Hoffman, Michael M.
    Buske, Orion J.
    Noble, William Stafford
    BIOINFORMATICS, 2010, 26 (11) : 1458 - 1459
  • [6] Towards Scalable Querying of Large-Scale Models
    Barmpis, Konstantinos
    Kolovos, Dimitrios S.
    MODELLING FOUNDATIONS AND APPLICATIONS, ECMFA 2014, 2014, 8569 : 35 - 50
  • [7] An adaptive spark-based framework for querying large-scale NoSQL and relational databases
    Khashan, Eman
    Eldesouky, Ali
    Elghamrawy, Sally
    PLOS ONE, 2021, 16 (08):
  • [8] Trends in large-scale mouse mutagenesis: from genetics to functional genomics
    Yoichi Gondo
    Nature Reviews Genetics, 2008, 9 : 803 - 810
  • [9] Large-scale production of enhancer trapping lines for rice functional genomics
    Yang, YZ
    Peng, H
    Huang, HM
    Wu, JX
    Ha, SR
    Huang, DF
    Lu, TG
    PLANT SCIENCE, 2004, 167 (02) : 281 - 288
  • [10] Statistical framework for large-scale integration of pathway knowledge in GWAS
    Biswas, Shrayashi
    Pal, Soumen
    Bhattacharjee, Samsiddhi
    GENETIC EPIDEMIOLOGY, 2018, 42 (07) : 689 - 690