Massively Parallel Processing Database for Sequence and Graph Data Structures Applied to Rapid-Response Drug Repurposing

被引:3
|
作者
Rickett, Christopher D. [1 ]
Maschhoff, Kristyn J. [1 ]
Sukumar, Sreenivas R. [1 ]
机构
[1] Hewlett Packard Enterprise, Spring, TX 77389 USA
关键词
graph database; graph analytics; in-database analytics; distributed processing; parallel processing; sequence analytics;
D O I
10.1109/BigData50022.2020.9378331
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present the application of a massively parallel-processing graph database for rapid-response drug repurposing. The novelty of our approach is that the scalable graph database is able to host a knowledge graph of medically relevant facts integrated from multiple knowledge sources and also act as a computational engine capable of in-database protein sequence analytics. We demonstrate the performance of the graph database on a real-world use-case to hypothesize cures for COVID-19, leveraging its built-in accelerated protein-sequence matching capabilities at unprecedented scale (to simultaneously handle data size and query latency requirements for interactive research). Based on supporting evidence from medical literature, we show that results generated by computing similarity of COVID-19 virus proteins across 4 million other open-science sequences and intelligently traversing over a 150 billion facts from open-science medical knowledge produces biologically insightful results. By presenting sample queries and extending application to use-cases beyond COVID-19, we demonstrate the use and value of the novel database for hypotheses generation in reducing the time-to-insight and increasing researcher productivity with interactivity.
引用
收藏
页码:2967 / 2976
页数:10
相关论文
共 4 条
  • [1] Beyond Conventional Data Warehousing - Massively Parallel Data Processing with Greenplum Database (Invited Talk)
    Waas, Florian M.
    BUSINESS INTELLIGENCE FOR THE REAL-TIME ENTERPRISE, 2009, 27 : 89 - 96
  • [2] Massively Parallel Processing of Whole Genome Sequence Data: An In-Depth Performance Study
    Roy, Abhishek
    Diao, Yanlei
    Evani, Uday
    Abhyankar, Avinash
    Howarth, Clinton
    Le Priol, Remi
    Bloom, Toby
    SIGMOD'17: PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2017, : 187 - 202
  • [3] Rapid, Massively Parallel Single-Cell Drug Response Measurements via Live Cell Interferometry
    Reed, Jason
    Chun, Jennifer
    Zangle, Thomas A.
    Kalim, Sheraz
    Hong, Jason S.
    Pefley, Sarah E.
    Zheng, Xin
    Gimzewski, James K.
    Teitell, Michael A.
    BIOPHYSICAL JOURNAL, 2011, 101 (05) : 1025 - 1031
  • [4] Detection of adverse drug reactions: evaluation of an automatic data processing applied in oncology performed in the French Diagnosis Related Groups database
    Quillet, Alexandre
    Colin, Olivier
    Bourgeois, Nicolas
    Favreliere, Sylvie
    Ferru, Aurelie
    Boinot, Laurence
    Lafay-Chebassier, Claire
    Perault-Pochat, Marie-Christine
    FUNDAMENTAL & CLINICAL PHARMACOLOGY, 2018, 32 (02) : 227 - 233