Biscuit: A Framework for Near-Data Processing of Big Data Workloads

被引:159
|
作者
Gu, Boncheol [1 ]
Yoon, Andre S. [1 ]
Bae, Duck-Ho [1 ]
Jo, Insoon [1 ]
Lee, Jinyoung [1 ]
Yoon, Jonghyun [1 ]
Kang, Jeong-Uk [1 ]
Kwon, Moonsang [1 ]
Yoon, Chanho [1 ]
Cho, Sangyeun [1 ]
Jeong, Jaeheon [1 ]
Chang, Duckhyun [1 ]
机构
[1] Samsung Elect Co Ltd, Memory Business, Suwon, South Korea
关键词
near-data processing; in-storage computing; SSD;
D O I
10.1109/ISCA.2016.23
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data-intensive queries are common in business intelligence, data warehousing and analytics applications. Typically, processing a query involves full inspection of large in-storage data sets by CPUs. An intuitive way to speed up such queries is to reduce the volume of data transferred over the storage network to a host system. This can be achieved by filtering out extraneous data within the storage, motivating a form of near-data processing. This work presents Biscuit, a novel near-data processing framework designed for modern solid-state drives. It allows programmers to write a data-intensive application to run on the host system and the storage system in a distributed, yet seamless manner. In order to offer a high-level programming model, Biscuit builds on the concept of data flow. Data processing tasks communicate through typed and data-ordered ports. Biscuit does not distinguish tasks that run on the host system and the storage system. As the result, Biscuit has desirable traits like generality and expressiveness, while promoting code reuse and naturally exposing concurrency. We implement Biscuit on a host system that runs the Linux OS and a high-performance solid-state drive. We demonstrate the effectiveness of our approach and implementation with experimental results. When data filtering is done by hardware in the solid-state drive, the average speed-up obtained for the top five queries of TPC-H is over 15x.
引用
收藏
页码:153 / 165
页数:13
相关论文
共 50 条
  • [1] Sorting big data on heterogeneous near-data processing systems
    Vermij, Erik
    Fiorin, Leandro
    Hagleitner, Christoph
    Bertels, Koen
    [J]. ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 349 - 354
  • [2] NEAR-DATA PROCESSING
    Balasubramonian, Rajeev
    Grot, Boris
    [J]. IEEE MICRO, 2016, 36 (01) : 4 - 5
  • [3] Overcoming Challenges to Near-Data Processing
    Jayasena, Nuwan
    [J]. IEEE MICRO, 2016, 36 (01) : 8 - 9
  • [4] Near-Data Processing of Neural Networks
    Chen, Yunji
    Tao, Jinhua
    [J]. IEEE MICRO, 2016, 36 (01) : 9 - 10
  • [5] Optimizing Near-Data Processing for Spark
    Rachuri, Sri Pramodh
    Gantasala, Arun
    Emanuel, Prajeeth
    Gandhi, Anshul
    Foley, Robert
    Puhov, Peter
    Gkountouvas, Theodoros
    Lei, Hui
    [J]. 2022 IEEE 42ND INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2022), 2022, : 636 - 646
  • [6] An Architecture for Near-Data Processing Systems
    Vermij, Erik
    Hagleitner, Christoph
    Fiorin, Leandro
    Jongerius, Rik
    van Lunteren, Jan
    Bertels, Koen
    [J]. PROCEEDINGS OF THE ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS (CF'16), 2016, : 357 - 360
  • [7] JAFAR: Near-Data Processing for Databases
    Babarinsa, Oreoluwa
    Idreos, Stratos
    [J]. SIGMOD'15: PROCEEDINGS OF THE 2015 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2015, : 2069 - 2070
  • [8] Streaming Analytics with Adaptive Near-data Processing
    Sandur, Atul
    Park, ChanHo
    Volos, Stavros
    Agha, Gul
    Jeon, Myeongjae
    [J]. COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 563 - 566
  • [9] COMPARING IMPLEMENTATIONS OF NEAR-DATA COMPUTING WITH IN-MEMORY MAPREDUCE WORKLOADS
    Pugsley, Seth H.
    Jestes, Jeffrey
    Balasubramonian, Rajeev
    Srinivasan, Vijayalakshmi
    Buyuktosunoglu, Alper
    Davis, Al
    Li, Feifei
    [J]. IEEE MICRO, 2014, 34 (04) : 44 - 52
  • [10] Computing En-Route for Near-Data Processing
    Huang, Jiayi
    Majumder, Pritam
    Kim, Sungkeun
    Fulton, Troy
    Puli, Ramprakash Reddy
    Yum, Ki Hwan
    Kim, Eun Jung
    [J]. IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (06) : 906 - 921