P4GPU: Acceleration of Programmable Data Plane Using a CPU-GPU Heterogeneous Architecture

被引：0

作者：

Li, Peilong ^{[1
]}

Luo, Yan ^{[1
]}

机构：

[1] Univ Massachusetts Lowell, Dept Elect & Comp Engn, Lowell, MA 01852 USA

来源：

2016 IEEE 17TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE SWITCHING AND ROUTING (HPSR) | 2016年

关键词：

Programmable Data Plane; Heterogeneous Architecture; Packet Processing; P4; IP LOOKUP;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The programmability of the network data plane has become one of the most desirable features within the context of software defined networks, with P4 serving as a domain-specific language for defining data plane processing. In this work, we are motivated to address the challenges of mapping a P4 defined data plane to a heterogeneous programmable hardware architecture consisting of both a CPU and a GPU, which includes a salient parallel SIMD architecture for processing network flows. We first design a toolset that can be used to map a P4 program onto the proposed architecture. We then optimize the GPU kernel designs for "match-action" primitives and present latency-hiding techniques to reduce the overheads of CPU/GPU communication. In addition, load balancing is investigated to maximize the utilization of CPU and GPU resources. Our toolset and optimizations allow a P4 program to render promising performance on the given heterogeneous architecture. Specifically, the experimental results collected on our prototype systems show that the automatically configured GPU kernels achieve scalable lookup and classification speeds with 420 million IP lookups per second, and more than 60 million classifications per second (for 4K firewall rules).

引用

页码：168 / 175

页数：8

共 50 条

[1] P4GPU: Accelerate Packet Processing of a P4 Program with a CPU-GPU Heterogeneous Architecture
Li, Peilong
Luo, Yan
PROCEEDINGS OF THE 2016 SYMPOSIUM ON ARCHITECTURES FOR NETWORKING AND COMMUNICATIONS SYSTEMS (ANCS'16), 2016, : 125 - 126
[2] Heterogeneous Cache Hierarchy Management for Integrated CPU-GPU Architecture
Wen, Hao
Zhang, Wei
2019 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2019,
[3] gem5-gpu: A Heterogeneous CPU-GPU Simulator
Power, Jason
Hestness, Joel
Orr, Marc S.
Hill, Mark D.
Wood, David A.
IEEE COMPUTER ARCHITECTURE LETTERS, 2015, 14 (01) : 34 - 36
[4] GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data
Chen, Cen
Li, Kenli
Ouyang, Aijia
Tang, Zhuo
Li, Keqin
PROCEEDINGS 45TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING - ICPP 2016, 2016, : 542 - 551
[5] GFlink: An In-Memory Computing Architecture on Heterogeneous CPU-GPU Clusters for Big Data
Chen, Cen
Li, Kenli
Ouyang, Aijia
Zeng, Zeng
Li, Keqin
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2018, 29 (06) : 1275 - 1288
[6] Denial of Service in CPU-GPU Heterogeneous Architectures
Wen, Hao
Zhang, Wei
2020 IEEE HIGH PERFORMANCE EXTREME COMPUTING CONFERENCE (HPEC), 2020,
[7] A Survey of CPU-GPU Heterogeneous Computing Techniques
Mittal, Sparsh
Vetter, Jeffrey S.
ACM COMPUTING SURVEYS, 2015, 47 (04)
[8] Heterogeneous CPU-GPU Execution of Stencil Applications
Siklosi, Balint
Reguly, Istvan Z.
Mudalige, Gihan R.
PROCEEDINGS OF 2018 IEEE/ACM INTERNATIONAL WORKSHOP ON PERFORMANCE, PORTABILITY AND PRODUCTIVITY IN HPC (P3HPC 2018), 2018, : 71 - 80
[9] Parallel Graph Partitioning on a CPU-GPU Architecture
Goodarzi, Bahareh
Burtscher, Martin
Goswami, Dhrubajyoti
2016 IEEE 30TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW), 2016, : 58 - 66
[10] Accelerating MapReduce on a Coupled CPU-GPU Architecture
Chen, Linchuan
Huo, Xin
Agrawal, Gagan
2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,

← 1 2 3 4 5 →