On Mixing High-Speed Updates and In-Memory Queries A Big-Data Architecture for Real-time Analytics

被引:0
|
作者
Zhong, Tao [1 ]
Doshi, Kshitij A. [1 ]
Tang, Xi [1 ]
Lou, Ting [1 ]
Lu, Zhongyan [1 ]
Li, Hong [1 ]
机构
[1] Intel, Software & Serv Grp, Santa Clara, CA 95052 USA
关键词
Big Data; Real-time; Low-latency; Analytics; Resilient Distributed Datasets; CRUD; Clustering;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Up-to-date business intelligence has become a critical differentiator for the modern data-driven highly engaged enterprise. It requires rapid integration of new information on a continuous basis for subsequent analyses. ETL-based and traditionally batch-processing oriented methods of absorbing changes into a relational database schema take time, and are therefore incompatible with very low-latency demands of real-time analytics. Instead, in-memory clustered stores that employ tunable consistency mechanisms are becoming attractive since they dispense with the need to transform and transit data between storage layouts and tiers. When data is updated infrequently, in-memory approaches such as RDD transformations in Spark can suffice, but as updates become frequent, such in-memory approaches need to be extended to support dynamic datasets. This paper describes a few key additional requirements that result from having to support in-memory processing of data while updates proceed concurrently. The paper describes Real-time Analytics Foundation (RAF), an architecture to meet the new requirements. Performance of an early implementation of RAF is also described: for an unaudited TPC-H derived workload, RAF shows a node-to-node scaling ratio of 88% at 8 nodes, and for a query equivalent to Q6 in the TPC-H set, RAF is able to show 9x improvement over that of Hive-Hadoop. The paper also describes two RAF based solutions that are being put together by two independent software vendors in China.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] A Big Data Architecture for Near Real-time Traffic Analytics
    Gong, Yikai
    Rimba, Paul
    Sinnott, Richard O.
    [J]. COMPANION PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING (UCC'17 COMPANION), 2017, : 157 - 162
  • [2] Big Data Analytics Architecture for Real-Time Traffic Control
    Amini, Sasan
    Gerostathopoulos, Ilias
    Prehofer, Christian
    [J]. 2017 5TH IEEE INTERNATIONAL CONFERENCE ON MODELS AND TECHNOLOGIES FOR INTELLIGENT TRANSPORTATION SYSTEMS (MT-ITS), 2017, : 710 - 715
  • [3] A methodology for real-time data sustainability in smart city: Towards inferencing and analytics for big-data
    Malik, Kaleem Razzaq
    Sam, Yacine
    Hussain, Majid
    Abuarqoub, Abdelrahman
    [J]. SUSTAINABLE CITIES AND SOCIETY, 2018, 39 : 548 - 556
  • [4] Real-Time Awareness Scheduling for Multimedia Big Data Oriented In-Memory Computing
    Xu, Jianwen
    Ota, Kaoru
    Dong, Mianxiong
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2018, 5 (05): : 3464 - 3473
  • [5] Distributed in-memory vocabulary tree for real-time retrieval of big data images
    Duan, Hancong
    Peng, Yubing
    Min, Geyong
    Xiang, Xiaoke
    Zhan, Wenhan
    Zou, Hao
    [J]. AD HOC NETWORKS, 2015, 35 : 137 - 148
  • [6] Oracle Database In-Memory on Active Data Guard: Real-time Analytics on a Standby Database
    Pendse, Sukhada
    Krishnaswamy, Vasudha
    Kulkarni, Kartik
    Li, Yunrui
    Lahiri, Tirthankar
    Raja, Vivekanandhan
    Zheng, Jing
    Girkar, Mahesh
    Kulkarni, Akshay
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020), 2020, : 1570 - 1578
  • [7] Real-time secure communication for Smart City in high-speed Big Data environment
    Rathore, M. Mazhar
    Paul, Anand
    Ahmad, Awais
    Chilamkurti, Naveen
    Hong, Won-Hwa
    Seo, HyunCheol
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 83 : 638 - 652
  • [8] High-speed Security Analytics Powered by In-memory Machine Learning Engine
    Sapegin, Andrey
    Gawron, Marian
    Jaeger, David
    Cheng, Feng
    Meinel, Christoph
    [J]. 2015 14TH INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED COMPUTING (ISPDC), 2015, : 74 - 81
  • [9] Supporting Real-time Networkwide T-Queries in High-speed Networks
    Wang, Yuanda
    Wang, Haibo
    Ma, Chaoyi
    Chen, Shigang
    [J]. 2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022), 2022, : 1 - 11
  • [10] Near real-time big-data processing for data driven applications
    Kampars, Janis
    Grabis, Janis
    [J]. 2017 3RD INTERNATIONAL CONFERENCE ON BIG DATA INNOVATIONS AND APPLICATIONS (INNOVATE-DATA), 2017, : 35 - 42