GRAPHONE: A Data Store for Real-time Analytics on Evolving Graphs

被引:31
|
作者
Kumar, Pradeep [1 ]
Huang, H. Howie [2 ]
机构
[1] William & Mary, Dept Comp Sci, 251 Jamestown Rd, Williamsburg, VA 23185 USA
[2] George Washington Univ, Dept Elect & Comp Engn, 800 22nd St NW, Washington, DC 20052 USA
基金
美国国家科学基金会;
关键词
Graph systems; graph data management; unified graph data store; batch analytics; stream analytics; INTERNET; ALGORITHM;
D O I
10.1145/3364180
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
There is a growing need to perform a diverse set of real-time analytics (batch and stream analytics) on evolving graphs to deliver the values of big data to users. The key requirement from such applications is to have a data store to support their diverse data access efficiently, while concurrently ingesting fine-grained updates at a high velocity. Unfortunately, current graph systems, either graph databases or analytics engines, are not designed to achieve high performance for both operations; rather, they excel in one area that keeps a private data store in a specialized way to favor their operations only. To address this challenge, we have designed and developed GRAPHONE, a graph data store that abstracts the graph data store away from the specialized systems to solve the fundamental research problems associated with the data store design. It combines two complementary graph storage formats (edge list and adjacency list) and uses dual versioning to decouple graph computations from updates. Importantly, it presents a new data abstraction, GraphView, to enable data access at two different granularities of data ingestions (called data visibility) for concurrent execution of diverse classes of real-time graph analytics with only a small data duplication. Experimental results show that GRAPHONE is able to deliver 11.40x and 5.36x average speedup in ingestion rate against LLAMA and Stinger, the two state-of-the-art dynamic graph systems, respectively. Further, they achieve an average speedup of 8.75x and 4.14x against LLAMA and 12.80x and 3.18x against Stinger for BFS and PageRank analytics (batch version), respectively. GRAPHONE also gains over 2,000x speedup against Kickstarter, a state-of-the-art stream analytics engine in ingesting the streaming edges and performing streaming BPS when treating first half as a base snapshot and rest as streaming edge in a synthetic graph. GRAPHONE also achieves an ingestion rate of two to three orders of magnitude higher than graph databases. Finally, we demonstrate that it is possible to run concurrent stream analytics from the same data store.
引用
收藏
页数:40
相关论文
共 50 条
  • [21] Big Data Stream Computing in Healthcare Real-Time Analytics
    Ta, Van-Dai
    Liu, Chuan-Ming
    Nkabinde, Goodwill Wandile
    PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016), 2016, : 37 - 42
  • [22] IoV distributed architecture for real-time traffic data analytics
    Nahri, Mohamed
    Boulmakoul, Azedine
    Karim, Lamia
    Lbath, Ahmed
    9TH INTERNATIONAL CONFERENCE ON AMBIENT SYSTEMS, NETWORKS AND TECHNOLOGIES (ANT 2018) / THE 8TH INTERNATIONAL CONFERENCE ON SUSTAINABLE ENERGY INFORMATION TECHNOLOGY (SEIT-2018) / AFFILIATED WORKSHOPS, 2018, 130 : 480 - 487
  • [23] A Survey on Real-time Big Data Analytics: Applications and Tools
    Yadranjiaghdam, Babak
    Pool, Nathan
    Tabrizi, Nasseh
    2016 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE & COMPUTATIONAL INTELLIGENCE (CSCI), 2016, : 404 - 409
  • [24] Spammer Detection for Real-Time Big Data Graphs
    Eom, Chris Soo-Hyun
    Lee, James Jung-hun
    Lee, Wookey
    Kim, Jinho
    2016 INT IEEE CONFERENCES ON UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING AND COMMUNICATIONS, CLOUD AND BIG DATA COMPUTING, INTERNET OF PEOPLE, AND SMART WORLD CONGRESS (UIC/ATC/SCALCOM/CBDCOM/IOP/SMARTWORLD), 2016, : 1227 - 1227
  • [25] An incremental approach for real-time Big Data visual analytics
    Garcia, Ignacio
    Casado, Ruben
    Bouchachia, Abdelhamid
    2016 IEEE 4TH INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD WORKSHOPS (FICLOUDW), 2016, : 177 - 182
  • [26] A Serverless Real-Time Data Analytics Platform for Edge Computing
    Nastic, Stefan
    Rausch, Thomas
    Scekic, Ognjen
    Dustdar, Schahram
    Gusev, Marjan
    Koteska, Bojana
    Kostoska, Magdalena
    Jakimovski, Boro
    Ristov, Sasko
    Prodan, Radu
    IEEE INTERNET COMPUTING, 2017, 21 (04) : 64 - 71
  • [27] Developing a Real-Time Data Analytics Framework using Hadoop
    Cha, Sangwhan
    Wachowicz, Monica
    2015 IEEE INTERNATIONAL CONGRESS ON BIG DATA - BIGDATA CONGRESS 2015, 2015, : 657 - 660
  • [28] Real-Time Predictive Analytics Using Degradation Image Data
    Fang, Xiaolei
    Paynabar, Kamran
    Gebraeel, Nagi
    2018 ANNUAL RELIABILITY AND MAINTAINABILITY SYMPOSIUM (RAMS), 2018,
  • [29] FAST: Near Real-time Searchable Data Analytics for the Cloud
    Hua, Yu
    Jiang, Hong
    Feng, Dan
    SC14: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2014, : 754 - 765
  • [30] OceanRT: Real-Time Analytics over Large Temporal Data
    Zhang, Shiming
    Yang, Yin
    Fan, Wei
    Lan, Liang
    Yuan, Mingxuan
    SIGMOD'14: PROCEEDINGS OF THE 2014 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2014, : 1099 - 1102