Dagger: Efficient and Fast RPCs in Cloud Microservices with Near-Memory Reconfigurable NICs

被引:28
|
作者
Lazarev, Nikita [1 ]
Xiang, Shaojie [1 ]
Adit, Neil [1 ]
Zhang, Zhiru [1 ]
Delimitrou, Christina [1 ]
机构
[1] Cornell Univ, Ithaca, NY 14853 USA
关键词
End-host networking; cloud computing; datacenters; RPC frameworks; microservices; smartNICs; FPGAs; cache-coherent FPGAs;
D O I
10.1145/3445814.3446696
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The ongoing shift of cloud services from monolithic designs to microservices creates high demand for efficient and high performance datacenter networking stacks, optimized for fine-grained workloads. Commodity networking systems based on software stacks and peripheral NICs introduce high overheads when it comes to delivering small messages. We present Dagger, a hardware acceleration fabric for cloud RPCs based on FPGAs, where the accelerator is closely-coupled with the host processor over a configurable memory interconnect. The three key design principle of Dagger are: (1) offloading the entire RPC stack to an FPGA-based NIC, (2) leveraging memory interconnects instead of PCIe buses as the interface with the host CPU, and (3) making the acceleration fabric reconfigurable, so it can accommodate the diverse needs of microservices. We show that the combination of these principles significantly improves the efficiency and performance of cloud RPC systems while preserving their generality. Dagger achieves 1.3 - 3.8x higher per-core RPC throughput compared to both highly-optimized software stacks, and systems using specialized RDMA adapters. It also scales up to 84 Mrps with 8 threads on 4 CPU cores, while maintaining state-of-the-art mu s-scale tail latency. We also demonstrate that large third-party applications, like memcached and MICA KVS, can be easily ported on Dagger with minimal changes to their codebase, bringing their median and tail KVS access latency down to 2.8 - 3.5 us and 5.4 - 7.8 us, respectively. Finally, we show that Dagger is beneficial for multi-tier end-to-end microservices with different threading models by evaluating it using an 8-tier application implementing a flight check-in service.
引用
收藏
页码:36 / 51
页数:16
相关论文
共 28 条
  • [21] An Integrated Solution to Improve Performance of In-Memory Data Caching With an Efficient Item Retrieving Mechanism and a Near-Memory Accelerator
    Kee, Minkwan
    Han, Chiwon
    Park, Gi-Ho
    IEEE ACCESS, 2023, 11 : 78726 - 78736
  • [22] TiPU: A Spatial-Locality-Aware Near-Memory Tile Processing Unit for 3D Point Cloud Neural Network
    Zheng, Jiapei
    Jiang, Hao
    Nie, Xinkai
    Huang, Zhangcheng
    Chen, Chixiao
    Liu, Qi
    2023 60TH ACM/IEEE DESIGN AUTOMATION CONFERENCE, DAC, 2023,
  • [23] NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks
    Raman, Siddhartha Raman Sundara
    John, Lizy
    Kulkarni, Jaydeep P.
    ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2024, 21 (02)
  • [24] Via-switch FPGA with transistor-free programmability enabling energy-efficient near-memory parallel computation
    Hashimoto, Masanori
    Bai, Xu
    Banno, Naoki
    Tada, Munehiro
    Sakamoto, Toshitsugu
    Yu, Jaehoon
    Doi, Ryutaro
    Onodera, Hidetoshi
    Imagawa, Takashi
    Ochi, Hiroyuki
    Wakabayashi, Kazutoshi
    Mitsuyama, Yukio
    Sugibayashi, Tadahiko
    JAPANESE JOURNAL OF APPLIED PHYSICS, 2022, 61 (SM)
  • [25] Novel Bit-Sliced Near-Memory Computing Based VLSI Architecture for Fast Sobel Edge Detection in IoT Edge Devices
    Joshi, Rajeev
    Zaman, Md Adnan
    Katkoori, Srinivas
    2020 6TH IEEE INTERNATIONAL SYMPOSIUM ON SMART ELECTRONIC SYSTEMS (ISES 2020) (FORMERLY INIS), 2020, : 291 - 296
  • [26] A 1-16b Reconfigurable 80Kb 7T SRAM-Based Digital Near-Memory Computing Macro for Processing Neural Networks
    Kim, Hyunjoon
    Mu, Junjie
    Yu, Chengshuo
    Kim, Tony Tae-Hyoung
    Kim, Bongjin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, 2023, 70 (04) : 1580 - 1590
  • [27] Enabling fast and energy-efficient FM-index exact matching using processing-near-memory
    Herruzo, Jose M.
    Fernandez, Ivan
    Gonzalez-Navarro, Sonia
    Plata, Oscar
    JOURNAL OF SUPERCOMPUTING, 2021, 77 (09): : 10226 - 10251
  • [28] Enabling fast and energy-efficient FM-index exact matching using processing-near-memory
    Jose M. Herruzo
    Ivan Fernandez
    Sonia González-Navarro
    Oscar Plata
    The Journal of Supercomputing, 2021, 77 : 10226 - 10251