A Performance Study on Large-Scale Data Analytics Using Disk-Based and In-Memory Database Systems

被引:0
|
作者
Chao, Pingfu [1 ]
He, Dan [1 ]
Sadiq, Shazia [1 ]
Zheng, Kai [2 ]
Zhou, Xiaofang [1 ]
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, St Lucia, Qld, Australia
[2] Soochow Univ, Adv Data Analyt Lab, Suzhou, Peoples R China
关键词
Data Warehousing; Performance Evaluation; Relational Database; In-memory Database;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the significant increase in memory size, in-memory database systems are becoming the dominant way of dealing with large scale data analytics as compared to the traditional disk-based systems such as data warehouses. Due to the significant differences in both physical and logical designs, these two systems show totally different characteristics on massive data analytic workload. In order to address the difference and technical reasons behind, we contrast the performance between disk-based data warehousing and in-memory database systems by comparing two state-of-the-art commercial systems using a large-scale real transportation dataset. This independent performance study reveals several interesting insights. Experimental evaluation shows that the in-memory system can achieve competitive performance on most data analytics queries with less model maintenance cost and more flexibility, but it is not capable in other cases. We summarise the results of our study and provide guidelines on how to select an appropriate system for a given data analytics task.
引用
收藏
页码:247 / 254
页数:8
相关论文
共 50 条
  • [1] SPARKBENCH: a spark benchmarking suite characterizing large-scale in-memory data analytics
    Li, Min
    Tan, Jian
    Wang, Yandong
    Zhang, Li
    Salapura, Valentina
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (03): : 2575 - 2589
  • [2] SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics
    Min Li
    Jian Tan
    Yandong Wang
    Li Zhang
    Valentina Salapura
    [J]. Cluster Computing, 2017, 20 : 2575 - 2589
  • [3] In-Memory Distributed Indexing for Large-Scale Media Data Retrieval
    Ma, Yinmiao
    Liu, Danlu
    Scott, Grant
    Uhlmann, Jeffrey
    Shyu, Chi-Ren
    [J]. 2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 232 - 239
  • [4] Distributed block formation and layout for disk-based management of large-scale graphs
    Abdurrahman Yaşar
    Buğra Gedik
    Hakan Ferhatosmanoğlu
    [J]. Distributed and Parallel Databases, 2017, 35 : 23 - 53
  • [5] Distributed block formation and layout for disk-based management of large-scale graphs
    Yasar, Abdurrahman
    Gedik, Bugra
    Ferhatosmanoglu, Hakan
    [J]. DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (01) : 23 - 53
  • [6] YinMem: a distributed parallel indexed in-memory computation system for large scale data analytics
    Huang, Yin
    Yesha, Yelena
    Halem, Milton
    Yesha, Yaacov
    Zhou, Shujia
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 214 - 222
  • [7] An Efficient Disk-Based Discontinuous Deformation Analysis Model for Simulating Large-Scale Problems
    Huang, Gang-Hai
    Xu, Yuan-Zhen
    Yi, Xiong-Wei
    Xia, Ming
    [J]. INTERNATIONAL JOURNAL OF GEOMECHANICS, 2020, 20 (07)
  • [8] Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems
    Freitag, Michael
    Kemper, Alfons
    Neumann, Thomas
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 2797 - 2810
  • [9] Analytics on Historical Data Using a Clustered Insert-Only In-Memory Column Database
    Schaffner, Jan
    Krueger, Jens
    Mueller, Stephan
    Hofmann, Paul
    Zeier, Alexander
    [J]. 2009 IEEE 16TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT, VOLS 1 AND 2, PROCEEDINGS, 2009, : 704 - +
  • [10] Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
    Veiga, Jorge
    Exposito, Roberto R.
    Pardo, Xoan C.
    Taboada, Guillermo L.
    Tourino, Juan
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 424 - 431