Biscuit: A Framework for Near-Data Processing of Big Data Workloads

被引:159
|
作者
Gu, Boncheol [1 ]
Yoon, Andre S. [1 ]
Bae, Duck-Ho [1 ]
Jo, Insoon [1 ]
Lee, Jinyoung [1 ]
Yoon, Jonghyun [1 ]
Kang, Jeong-Uk [1 ]
Kwon, Moonsang [1 ]
Yoon, Chanho [1 ]
Cho, Sangyeun [1 ]
Jeong, Jaeheon [1 ]
Chang, Duckhyun [1 ]
机构
[1] Samsung Elect Co Ltd, Memory Business, Suwon, South Korea
关键词
near-data processing; in-storage computing; SSD;
D O I
10.1109/ISCA.2016.23
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data-intensive queries are common in business intelligence, data warehousing and analytics applications. Typically, processing a query involves full inspection of large in-storage data sets by CPUs. An intuitive way to speed up such queries is to reduce the volume of data transferred over the storage network to a host system. This can be achieved by filtering out extraneous data within the storage, motivating a form of near-data processing. This work presents Biscuit, a novel near-data processing framework designed for modern solid-state drives. It allows programmers to write a data-intensive application to run on the host system and the storage system in a distributed, yet seamless manner. In order to offer a high-level programming model, Biscuit builds on the concept of data flow. Data processing tasks communicate through typed and data-ordered ports. Biscuit does not distinguish tasks that run on the host system and the storage system. As the result, Biscuit has desirable traits like generality and expressiveness, while promoting code reuse and naturally exposing concurrency. We implement Biscuit on a host system that runs the Linux OS and a high-performance solid-state drive. We demonstrate the effectiveness of our approach and implementation with experimental results. When data filtering is done by hardware in the solid-state drive, the average speed-up obtained for the top five queries of TPC-H is over 15x.
引用
收藏
页码:153 / 165
页数:13
相关论文
共 50 条
  • [21] Near-Data Processing in Memory Expander for DNN Acceleration on GPUs
    Ham, Hyungkyu
    Cho, Hyunuk
    Kim, Minjae
    Park, Jueon
    Hong, Jeongmin
    Sung, Hyojin
    Park, Eunhyeok
    Lim, Euicheol
    Kim, Gwangsun
    [J]. IEEE COMPUTER ARCHITECTURE LETTERS, 2021, 20 (02) : 171 - 174
  • [22] HRL: Efficient and Flexible Reconfigurable Logic for Near-Data Processing
    Gao, Mingyu
    Kozyrakis, Christos
    [J]. PROCEEDINGS OF THE 2016 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE (HPCA-22), 2016, : 126 - 137
  • [23] A Near-Data Processing Server Architecture and Its Impact on Data Center Applications
    Song, Xiaojia
    Xie, Tao
    Fischer, Stephen
    [J]. HIGH PERFORMANCE COMPUTING, ISC HIGH PERFORMANCE 2019, 2019, 11501 : 81 - 98
  • [24] Active-Routing: Compute on the Way for Near-Data Processing
    Huang, Jiayi
    Puli, Ramprakash Reddy
    Majumder, Pritam
    Kim, Sungkeun
    Boyapati, Rahul
    Yum, Ki Hwan
    Kim, Eun Jung
    [J]. 2019 25TH IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA), 2019, : 674 - 686
  • [25] NATSA: A Near-Data Processing Accelerator for Time Series Analysis
    Fernandez, Ivan
    Quislant, Ricardo
    Giannoula, Christina
    Alser, Mohammed
    Gomez-Luna, Juan
    Gutierrez, Eladio
    Plata, Oscar
    Mutlu, Onur
    [J]. 2020 IEEE 38TH INTERNATIONAL CONFERENCE ON COMPUTER DESIGN (ICCD 2020), 2020, : 120 - 129
  • [26] Optimistic Regular Expression Matching on FPGAs for Near-Data Processing
    Becher, Andreas
    Wildermann, Stefan
    Teich, Juergen
    [J]. 14TH INTERNATIONAL WORKSHOP ON DATA MANAGEMENT ON NEW HARDWARE (DAMON 2018), 2018,
  • [27] Practical Near-Data Processing for In-memory Analytics Frameworks
    Gao, Mingyu
    Ayers, Grant
    Kozyrakis, Christos
    [J]. 2015 INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURE AND COMPILATION (PACT), 2015, : 113 - 124
  • [28] On the Necessity of Explicit Cross-Layer Data Formats in Near-Data Processing Systems
    Vincon, Tobias
    Bernhardt, Arthur
    Weber, Lukas
    Koch, Andreas
    Petrov, Ilia
    [J]. 2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING WORKSHOPS (ICDEW 2020), 2020, : 109 - 114
  • [29] On the necessity of explicit cross-layer data formats in near-data processing systems
    Lukas Weber
    Tobias Vinçon
    Christian Knödler
    Leonardo Solis-Vasquez
    Arthur Bernhardt
    Ilia Petrov
    Andreas Koch
    [J]. Distributed and Parallel Databases, 2022, 40 : 27 - 45
  • [30] Exploiting Near-Data Processing to Accelerate Time Series Analysis
    Fernandez, Ivan
    Quislant, Ricardo
    Giannoula, Christina
    Alser, Mohammed
    Gomez-Luna, Juan
    Gutierrez, Eladio
    Plata, Oscar
    Mutlu, Onur
    [J]. 2022 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2022), 2022, : 279 - 282