BlueDBM: Distributed Flash Storage for Big Data Analytics

被引:15
|
作者
Jun, Sang-Woo [1 ]
Liu, Ming [1 ]
Lee, Sungjin [2 ,6 ]
Hicks, Jamey [3 ,7 ]
Ankcorn, John [3 ,4 ]
King, Myron [3 ,8 ]
Xu, Shuotao [1 ]
Arvind [5 ]
机构
[1] MIT, Stata Ctr, 32-G836,32 Vassar St, Cambridge, MA 02139 USA
[2] MIT, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[3] Quanta Res Cambridge, Cambridge, MA USA
[4] MIT, Stata Ctr, 32-G870,32 Vassar St, Cambridge, MA USA
[5] MIT, Stata Ctr, 32-G866,32 Vassar St, Cambridge, MA USA
[6] Inha Univ, Room 1010,High Tech Bldg,100 Inharo, Incheon, South Korea
[7] Accelerated Tech Inc, Cambridge, MA USA
[8] 38 Ashland St, Arlington, MA 02476 USA
来源
ACM TRANSACTIONS ON COMPUTER SYSTEMS | 2016年 / 34卷 / 03期
关键词
Wireless sensor networks; media access control; multichannel; radio interference; time synchronization;
D O I
10.1145/2898996
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Complex data queries, because of their need for random accesses, have proven to be slow unless all the data can be accommodated in DRAM. There are many domains, such as genomics, geological data, and daily Twitter feeds, where the datasets of interest are 5TB to 20TB. For such a dataset, one would need a cluster with 100 servers, each with 128GB to 256GB of DRAM, to accommodate all the data in DRAM. On the other hand, such datasets could be stored easily in the flash memory of a rack-sized cluster. Flash storage has much better random access performance than hard disks, which makes it desirable for analytics workloads. However, currently available off-the-shelf flash storage packaged as SSDs does not make effective use of flash storage because it incurs a great amount of additional overhead during flash device management and network access. In this article, we present BlueDBM, a new system architecture that has flash-based storage with in-store processing capability and a low-latency high-throughput intercontroller network between storage devices. We show that BlueDBM outperforms a flash-based system without these features by a factor of 10 for some important applications. While the performance of a DRAM-centric system falls sharply even if only 5% to 10% of the references are to secondary storage, this sharp performance degradation is not an issue in BlueDBM. BlueDBM presents an attractive point in the cost/performance tradeoff for Big Data analytics.
引用
收藏
页数:31
相关论文
共 50 条
  • [1] BlueDBM: An Appliance for Big Data Analytics
    Jun, Sang-Woo
    Liu, Ming
    Lee, Sungjin
    Hicks, Jamey
    Ankcorn, John
    King, Myron
    Xu, Shuotao
    Arvind
    [J]. 2015 ACM/IEEE 42ND ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2015, : 1 - 13
  • [2] Big Data Analytics on Flash Storage with Accelerators
    Arvind
    [J]. 2016 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION TECHNIQUES (PACT), 2016, : 1 - 1
  • [3] An algebra for distributed Big Data analytics
    Fegaras, Leonidas
    [J]. JOURNAL OF FUNCTIONAL PROGRAMMING, 2017, 27
  • [4] Distributed Analytics For Big Data: A Survey
    Berloco, Francesco
    Bevilacqua, Vitoantonio
    Colucci, Simona
    [J]. NEUROCOMPUTING, 2024, 574
  • [5] Distributed Big Data Analytics in the Internet of Signals
    Anavangot, Vijay
    Menon, Varun G.
    Nayyar, Anand
    [J]. PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON SYSTEM MODELING & ADVANCEMENT IN RESEARCH TRENDS (SMART), 2018, : 73 - 77
  • [6] Distributed Big Data Analytics in Service Computing
    Yu, Weider D.
    Gottumukkala, AvinashChander
    Senthailselvi, Deenash Arivazhagan
    Maniraj, Prabhu
    Khonde, Tushar
    [J]. 2017 IEEE 13TH INTERNATIONAL SYMPOSIUM ON AUTONOMOUS DECENTRALIZED SYSTEMS (ISADS 2017), 2017, : 55 - 60
  • [7] Distributed algorithm for big data analytics in healthcare
    Forestiero, Agostino
    Papuzzo, Giuseppe
    [J]. 2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, : 776 - 779
  • [8] Active Flash: Out-of-core Data Analytics on Flash Storage
    Boboila, Simona
    Kim, Youngjae
    Vazhkudai, Sudharshan S.
    Desnoyers, Peter
    Shipman, Galen M.
    [J]. 2012 IEEE 28TH SYMPOSIUM ON MASS STORAGE SYSTEMS AND TECHNOLOGIES (MSST), 2012,
  • [9] Speculative Distributed CSV Data Parsing for Big Data Analytics
    Ge, Chang
    Li, Yinan
    Eilebrecht, Eric
    Chandramouli, Badrish
    Kossmann, Donald
    [J]. SIGMOD '19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2019, : 883 - 899
  • [10] A Distributed Big Data Analytics Architecture for Vehicle Sensor Data
    Alexakis, Theodoros
    Peppes, Nikolaos
    Demestichas, Konstantinos
    Adamopoulou, Evgenia
    [J]. SENSORS, 2023, 23 (01)