A High-Performance Distributed Relational Database System for Scalable OLAP Processing

被引:5
|
作者
Arnold, Jason [1 ]
Glavic, Boris [1 ]
Raicu, Ioan [1 ]
机构
[1] IIT, Chicago, IL 60616 USA
基金
美国国家科学基金会;
关键词
SQL; big data; distributed query processing;
D O I
10.1109/IPDPS.2019.00083
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The scalability of systems such as Hive and Spark SQL that are built on top of big data platforms have enabled query processing over very large data sets. However, the per-node performance of these systems is typically low compared to traditional relational databases. Conversely, Massively Parallel Processing (MPP) databases do not scale as well as these systems. We present HRDBMS, a fully implemented distributed shared-nothing relational database developed with the goal of improving the scalability of OLAP queries. HRDBMS achieves high scalability through a principled combination of techniques from relational and big data systems with novel communication and work-distribution techniques. While we also support serializable transactions, the system has not been optimized for this use case. HRDBMS runs on a custom distributed and asynchronous execution engine that was built from the ground up to support highly parallelized operator implementations. Our experimental comparison with Hive, Spark SQL, and Greenplum confirms that HRDBMS's scalability is on par with Hive and Spark SQL (up to 96 nodes) while its per-node performance can compete with MPP databases like Greenplum.
引用
收藏
页码:738 / 748
页数:11
相关论文
共 50 条
  • [1] High-Performance Query Processing of a Real-World OLAP Database with ParGRES
    Paes, Melissa
    Lima, Alexandre A. B.
    Valduriez, Patrick
    Mattoso, Marta
    [J]. HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2008, 2008, 5336 : 188 - +
  • [2] Ceph: A scalable, high-performance distributed file system
    Weil, Sage A.
    Brandt, Scott A.
    Miller, Ethan L.
    Long, Darrell D. E.
    Maltzahn, Carlos
    [J]. USENIX ASSOCIATION 7TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, 2006, : 307 - +
  • [3] A High-Performance and Scalable Distributed Storage and Computing System for IMS Services
    Seraoui, Youssef
    Bellafkih, Mostafa
    Raouyane, Brahim
    [J]. 2016 2ND INTERNATIONAL CONFERENCE ON CLOUD COMPUTING TECHNOLOGIES AND APPLICATIONS (CLOUDTECH), 2016, : 335 - 342
  • [4] OLAP support in an object-relational database system
    Cook, RP
    [J]. IKE'03: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE ENGINEERING, VOLS 1 AND 2, 2003, : 560 - 565
  • [5] Scalable and Flexible High-Performance In-Network Processing of Hash Joins in Distributed Databases
    Wirth, Johannes
    Hofmann, Jaco A.
    Thostrup, Lasse
    Binnig, Carsten
    Koch, Andreas
    [J]. 2021 INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE TECHNOLOGY (ICFPT), 2021, : 212 - 220
  • [6] CONTENT: A practical, scalable, high-performance multimedia database
    Yapp, L
    Yamashita, C
    Zick, G
    [J]. ACM DIGITAL LIBRARIES '97, 1997, : 185 - 192
  • [7] An Extended IMS Framework With a High-Performance and Scalable Distributed Storage and Computing System
    Seraoui, Youssef
    Raouyane, Brahim
    Bellafkih, Mostafa
    [J]. 2017 INTERNATIONAL SYMPOSIUM ON NETWORKS, COMPUTERS AND COMMUNICATIONS (ISNCC), 2017,
  • [8] Grasper: A High Performance Distributed System for OLAP on Property Graphs
    Chen, Hongzhi
    Li, Changji
    Fang, Juncheng
    Huang, Chenghuan
    Cheng, James
    Zhang, Jian
    Hou, Yifan
    Yan, Xiao
    [J]. PROCEEDINGS OF THE 2019 TENTH ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '19), 2019, : 87 - 100
  • [9] VOLAP: A Scalable Distributed System for Real-Time OLAP with High Velocity Data
    Dehne, Frank
    Robillard, David
    Rau-Chaplin, Andrew
    Burke, Neil
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON CLUSTER COMPUTING (CLUSTER), 2016, : 354 - 363
  • [10] Scalable Linear Algebra on a Relational Database System
    Luo, Shangyu
    Gao, Zekai J.
    Gubanov, Michael
    Perez, Luis L.
    Jankov, Dimitrije
    Jermaine, Christopher
    [J]. COMMUNICATIONS OF THE ACM, 2020, 63 (08) : 93 - 101