An In-memory Database Implementation Technique based on Separation of Management, Computation and Storage

被引:0
|
作者
Zhang Y.-S. [1 ,2 ,3 ]
Han R.-C. [1 ,2 ]
Liu Z. [4 ]
Zhang Y. [1 ,2 ,3 ]
机构
[1] Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University, Ministry of Education, Beijing
[2] School of Information, Renmin University of China, Beijing
[3] National Survey Research Center, Renmin University of China, Beijing
[4] Intel China Research Center Ltd, Beijing
[5] National Satellite Meteorological Centre, Beijing
来源
基金
中国国家自然科学基金;
关键词
compute and storage; in-memory database; separation of data and meta data; separation of manage; separation of storage and compute; vector index;
D O I
10.11897/SP.J.1016.2023.00761
中图分类号
学科分类号
摘要
Heterogeneous storage/computing platform has been main-stream high performance computing platform with the support of multicore processors, big memory and non-volatile memory techniques. The traditional database engines are co-designed for storage and compute,the emerging databases employ separation of storage and compute and pushdown compute techniques for novel distributed storage infrastructure. This paper introduces a novel in-memory database implementation based on separation of manage, compute and storage technique, based on separation of storage and compute,it further separates the dataset into meta dataset and value dataset according to the characteristics of database schema,data distribution and workload. The unified query engine is divided into meta data management engine,computing engine and storage engine. The meta data with semantic information management is abstracted as independent management layer, the non-semantic value storage and compute are abstracted as compute and storage layers,and the compute-intensive workload is further defined as compute layer,the data-intensive workload is defined as storage layer, the compute layer and storage layer can be combined or separated according to different hardware configurations. The implementation of in-memory database is designed as following levels:1)schema optimization, separating value and meta data in database to choose different storage and compute strategies according to the inner data features; 2) data model optimization, the Fusion OLAP model supports the high performance multidimensional compute on relational storage model;3)algorithm optimizations,using surrogate key index and vector index to support the optimal vector join and vector aggregation for higher OLAP performance;4)system design optimizations,the layered database engine separates the mange and compute,storage and compute,and pushdown multidimensional compute to storage layer. The experimental results show that the separation of manage,compute and storage model can flexibly support hybrid CPU-GPU computing platform,hybrid DRAM-PM (Persistent Memory)storage platform and external storage platform, by employing the open-source in-memory column store Arrow as data storage engine for database and pushing down multidimensional compute to Arrow storage engine,the OLAP implementation proves to be equal performance as OLAP implementation co-designed for storage and compute in Star Schema Benchmark, the OLAP performance outperforms the leading in-memory databases Hyper,OmniSciDB and Arrow based GPU database PG-Strom. © 2023 Science Press. All rights reserved.
引用
收藏
页码:761 / 779
页数:18
相关论文
共 14 条
  • [1] Verbitski A,, Gupta A, DSahaet al. Amazon Aurora:Design Considerations for High Throughput Cloud-Native Relational Databases, Proceedings of the 2017 ACM SIGMOD, pp. 1041-1052, (2017)
  • [2] Verbitski A,, Gupta A, DSahaet al. Amazon Aurora: On Avoiding Distributed Consensus for I/Os, Commits, and Membership Changes, Proceedings of the 2018 ACM SIGMOD, pp. 789-796, (2018)
  • [3] Taurus Database:How to be Fast,Available,and Frugal in the Cloud, Proceedings of the 2020 ACM SIGMOD, pp. 1463-1478, (2020)
  • [4] Yu X,, Youill M, MWoiciket al. PushdownDB:Accelerating a DBMS Using S3 Computation, Proceedings of 2020 IEEE 36th Intl. Conf. on Data Engineering(ICDE), pp. 1802-1805, (2020)
  • [5] POLARDB Meets Computational Storage:Efficiently Support Analytical Workloads in Cloud-Native Relational Database, Proceedings of the 18th USENIX Conf. on File and Storage Technologies, pp. 29-41, (2020)
  • [6] Yansong Zhang, Yu Zhang, Shan Wang, Jiaheng Lu, Fusion OLAP:Fusing the Pros of MOLAP and ROLAP Together for In-Memory OLAP[J], IEEE Transactions on Knowledge and Data Engineering, 31, 9, pp. 1722-1735, (2019)
  • [7] Yansong Zhang, Yu Zhang, Jiaheng Lu, Shan Wang, Zhuan Liu, Ruichen Han, One size does not fit all:accelerating OLAP workloads with GPUs, Distributed Parallel Databases, 38, 4, pp. 995-1037, (2020)
  • [8] Balkesen C., , Teubner J., Alonso G., and Ozsu T. Main-memory hash joins on multi-core cpus: Tuning to the underlying hardware, Proceedings of the International Conference on Data Engineering(ICDE), pp. 362-373, (2013)
  • [9] Xiangyao Yu, A Study of the Fundamental Performance Characteristics of GPUs and CPUs for Database Analytics, SIGMOD Conference, pp. 1617-1632, (2020)
  • [10] Yansong Zhang, Shan Wang, Jiaheng Lu, Improving Performance by Creating a Native Join-Index for OLAP, Frontiers of Computer Science in China, 5, 2, pp. 236-249, (2011)