A Performance Study on Large-Scale Data Analytics Using Disk-Based and In-Memory Database Systems

被引：0

作者：

Chao, Pingfu ^{[1
]}

He, Dan ^{[1
]}

Sadiq, Shazia ^{[1
]}

Zheng, Kai ^{[2
]}

Zhou, Xiaofang ^{[1
]}

机构：

[1] Univ Queensland, Sch Informat Technol & Elect Engn, St Lucia, Qld, Australia

[2] Soochow Univ, Adv Data Analyt Lab, Suzhou, Peoples R China

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP) | 2017年

关键词：

Data Warehousing; Performance Evaluation; Relational Database; In-memory Database;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

With the significant increase in memory size, in-memory database systems are becoming the dominant way of dealing with large scale data analytics as compared to the traditional disk-based systems such as data warehouses. Due to the significant differences in both physical and logical designs, these two systems show totally different characteristics on massive data analytic workload. In order to address the difference and technical reasons behind, we contrast the performance between disk-based data warehousing and in-memory database systems by comparing two state-of-the-art commercial systems using a large-scale real transportation dataset. This independent performance study reveals several interesting insights. Experimental evaluation shows that the in-memory system can achieve competitive performance on most data analytics queries with less model maintenance cost and more flexibility, but it is not capable in other cases. We summarise the results of our study and provide guidelines on how to select an appropriate system for a given data analytics task.

引用

页码：247 / 254

页数：8

共 50 条

[1] SPARKBENCH: a spark benchmarking suite characterizing large-scale in-memory data analytics
Li, Min
Tan, Jian
Wang, Yandong
Zhang, Li
Salapura, Valentina
[J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2017, 20 (03): : 2575 - 2589
[2] SparkBench: a spark benchmarking suite characterizing large-scale in-memory data analytics
Min Li
Jian Tan
Yandong Wang
Li Zhang
Valentina Salapura
[J]. Cluster Computing, 2017, 20 : 2575 - 2589
[3] In-Memory Distributed Indexing for Large-Scale Media Data Retrieval
Ma, Yinmiao
Liu, Danlu
Scott, Grant
Uhlmann, Jeffrey
Shyu, Chi-Ren
[J]. 2017 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2017, : 232 - 239
[4] Distributed block formation and layout for disk-based management of large-scale graphs
Abdurrahman Yaşar
Buğra Gedik
Hakan Ferhatosmanoğlu
[J]. Distributed and Parallel Databases, 2017, 35 : 23 - 53
[5] Distributed block formation and layout for disk-based management of large-scale graphs
Yasar, Abdurrahman
Gedik, Bugra
Ferhatosmanoglu, Hakan
[J]. DISTRIBUTED AND PARALLEL DATABASES, 2017, 35 (01) : 23 - 53
[6] YinMem: a distributed parallel indexed in-memory computation system for large scale data analytics
Huang, Yin
Yesha, Yelena
Halem, Milton
Yesha, Yaacov
Zhou, Shujia
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 214 - 222
[7] An Efficient Disk-Based Discontinuous Deformation Analysis Model for Simulating Large-Scale Problems
Huang, Gang-Hai
Xu, Yuan-Zhen
Yi, Xiong-Wei
Xia, Ming
[J]. INTERNATIONAL JOURNAL OF GEOMECHANICS, 2020, 20 (07)
[8] Memory-Optimized Multi-Version Concurrency Control for Disk-Based Database Systems
Freitag, Michael
Kemper, Alfons
Neumann, Thomas
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2022, 15 (11): : 2797 - 2810
[9] Analytics on Historical Data Using a Clustered Insert-Only In-Memory Column Database
Schaffner, Jan
Krueger, Jens
Mueller, Stephan
Hofmann, Paul
Zeier, Alexander
[J]. 2009 IEEE 16TH INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING AND ENGINEERING MANAGEMENT, VOLS 1 AND 2, PROCEEDINGS, 2009, : 704 - +
[10] Performance Evaluation of Big Data Frameworks for Large-Scale Data Analytics
Veiga, Jorge
Exposito, Roberto R.
Pardo, Xoan C.
Taboada, Guillermo L.
Tourino, Juan
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2016, : 424 - 431

← 1 2 3 4 5 →