Mapping Streaming Applications on Commodity Multi-CPU and GPU On-Chip Processors

被引：15

作者：

Vilches, Antonio ^{[1
]}

Navarro, Angeles ^{[1
]}

Asenjo, Rafael ^{[1
]}

Corbera, Francisco ^{[1
]}

Gran, Ruben ^{[2
]}

Garzaran, Maria J. ^{[3
]}

机构：

[1] Univ Malaga, E-29071 Malaga, Spain

[2] Univ Zaragoza, E-50009 Zaragoza, Spain

[3] UIUC, Dept Comp Sci, Urbana, IL USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2016年 / 27卷 / 04期

关键词：

Heterogeneous CPU-GPU chips; pipeline pattern; adaptive mapping; analytical model; energy aware;

D O I：

10.1109/TPDS.2015.2432809

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In this paper, we consider the problem of efficiently executing streaming applications on commodity processors composed of several cores and an on-chip GPU. Streaming applications, such as those in vision and video analytic, consist of a pipeline of stages and are good candidates to take advantage of this type of platforms. We also consider that characteristics of the input may change while the application is running. Therefore, we propose a framework that adaptively finds the optimal mapping of the pipeline stages. The core of the framework is an analytical model coupled with information collected at runtime used to dynamically map each pipeline stage to the most efficient device, taking into consideration both performance and energy. Our experimental results show that for the evaluated applications running on two different architectures, our model always predicts the best configuration among the evaluated alternatives, and significantly reduces the amount of information that needs to be collected at runtime. This best configuration has, on the average, 20 percent higher throughput than the configuration recommended by a baseline state of the art approach, while the ratio throughput/energy is 43 percent higher. We have measured improvements in throughput and throughput/energy of up-to 81 and 204 percent, respectively, when the model is used to adapt to a video that changes from low to high definition.

引用

页码：1099 / 1115

页数：17

共 50 条

[41] Using Criticality of GPU Accesses in Memory Management for CPU-GPU Heterogeneous Multi-Core Processors
Rai, Siddharth
Chaudhuri, Mainak
ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2017, 16
[42] Dynamic and thermodynamic crossover scenarios in the Kob-Andersen mixture: Insights from multi-CPU and multi-GPU simulations
Coslovich, Daniele
Ozawa, Misaki
Kob, Walter
EUROPEAN PHYSICAL JOURNAL E, 2018, 41 (05):
[43] Solvated and generalised Born calculations differences using GPU CUDA and multi-CPU simulations of an antifreeze protein with AMBER
Peramo, Antonio
MOLECULAR SIMULATION, 2016, 42 (15) : 1263 - 1273
[44] Dynamic and thermodynamic crossover scenarios in the Kob-Andersen mixture: Insights from multi-CPU and multi-GPU simulations
Daniele Coslovich
Misaki Ozawa
Walter Kob
The European Physical Journal E, 2018, 41
[45] Exploring data flow design and vectorization with oneAPI for streaming applications on CPU plus GPU
Campos, Cristian
Asenjo, Rafael
Navarro, Angeles
JOURNAL OF SUPERCOMPUTING, 2025, 81 (02):
[46] Scenario-Based Design Flow for Mapping Streaming Applications onto On-Chip Many-Core Systems
Schor, Lars
Bacivarov, Iuliana
Rai, Devendra
Yang, Hoeseok
Kang, Shin-Haeng
Thiele, Lothar
CASES'12: PROCEEDINGS OF THE 2012 ACM INTERNATIONAL CONFERENCE ON COMPILERS, ARCHITECTURES AND SYNTHESIS FOR EMBEDDED SYSTEMS, 2012, : 71 - 80
[47] Performance and Power Consumption Investigation for Execution of Integer Operations on CPU and GPU Processors for Multimedia Applications
Iovanovici, A.
Visan, C.
Marcu, M.
2009 7TH INTERNATIONAL SYMPOSIUM ON INTELLIGENT SYSTEMS AND INFORMATICS, 2009, : 258 - 262
[48] UMA-MF: A Unified Multi-CPU/GPU Asynchronous Computing Framework for SGD-Based Matrix Factorization
Huang, Yizhi
Liu, Yan
Bai, Yang
Chen, Si
Li, Renfa
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (11) : 2978 - 2993
[49] Exploiting On-Chip Routers to Store Dirty Cache Blocks in Tiled Chip Multi-Processors
Das, Abhijit
Kumar, Abhishek
Jose, John
Palesi, Maurizio
2020 IEEE COMPUTER SOCIETY ANNUAL SYMPOSIUM ON VLSI (ISVLSI 2020), 2020, : 147 - 152
[50] Static cache partitioning robustness analysis for embedded on-chip multi-processors
Molnos, Anca M.
Cotofana, Sorin D.
Heijligers, Marc J. M.
van Eijndhoven, Jos T. J.
TRANSACTIONS ON HIGH-PERFORMANCE EMBEDDED ARCHITECTURES AND COMPILERS I, 2007, 4050 : 279 - +

← 1 2 3 4 5 →