Prefetching with Helper Threads for Loosely Coupled Multiprocessor Systems

被引：19

作者：

Lee, Jaejin ^{[1
]}

Jung, Changhee ^{[2
]}

Lim, Daeseob ^{[1
]}

Solihin, Yan ^{[3
]}

机构：

[1] Seoul Natl Univ, Sch Engn & Comp Sci, Seoul 151744, South Korea

[2] Georgia Inst Technol, Sch Comp Sci, Atlanta, GA 30332 USA

[3] N Carolina State Univ, Dept Elect & Comp Engn, Raleigh, NC 27695 USA

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2009年 / 20卷 / 09期

基金：

美国国家科学基金会;

关键词：

Helper thread; prefetching; chip multiprocessors; processing-in-memory system;

D O I：

10.1109/TPDS.2008.224

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents a helper thread prefetching scheme that is designed to work on loosely coupled processors, such as in a standard chip multiprocessor (CMP) system or an intelligent memory system. Loosely coupled processors have an advantage in that resources such as processor and L1 cache resources are not contended by the application and helper threads, hence preserving the speed of the application. However, interprocessor communication is expensive in such a system. We present techniques to alleviate this. Our approach exploits large loop-based code regions and is based on a new synchronization mechanism between the application and helper threads. This mechanism precisely controls how far ahead the execution of the helper thread can be with respect to the application thread. We found that this is important in ensuring prefetching timeliness and avoiding cache pollution. To demonstrate that prefetching in a loosely coupled system can be done effectively, we evaluate our prefetching by simulating a standard unmodified CMP system and an intelligent memory system where a simple processor in memory executes the helper thread. Evaluating our scheme with nine memory-intensive applications with the memory processor in DRAM achieves an average speedup of 1.25. Moreover, our scheme works well in combination with a conventional processor-side sequential L1 prefetcher, resulting in an average speedup of 1.31. In a standard CMP, the scheme achieves an average speedup of 1.33. Using a real CMP system with a shared L2 cache between two cores, our helper thread prefetching plus hardware L2 prefetching achieves an average speedup of 1.15 over the hardware L2 prefetching for the subset of applications with high L2 cache misses per cycle.

引用

页码：1309 / 1324

页数：16

共 50 条

[1] Load management in loosely coupled multiprocessor systems
Bataineh, S
Al-Ibrahim, M
[J]. DYNAMICS AND CONTROL, 1998, 8 (01) : 109 - 116
[2] SYNTHESIS OF CONTROL-STRUCTURES FOR LOOSELY COUPLED MULTIPROCESSOR SYSTEMS
JUST, JR
KOCZKODAJ, WW
[J]. IEEE INTERNATIONAL CONFERENCE ON SYSTEMS ENGINEERING ///, 1989, : 359 - 362
[3] Parallel parsing on a loosely coupled multiprocessor
Ra, DY
Kim, JH
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1996, E79D (12) : 1620 - 1628
[4] A parallel and distributed genetic algorithm on loosely-coupled multiprocessor systems
Matsumura, T
Nakamura, M
Okech, J
Onaga, K
[J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1998, E81A (04) : 540 - 546
[5] LOOSELY-COUPLED MULTIPROCESSOR OF SX SYSTEM
YAMAMOTO, M
AIZAWA, M
[J]. NEC RESEARCH & DEVELOPMENT, 1987, (86): : 91 - 94
[6] EFFICIENT SCHEDULING ALGORITHM FOR DIVISIBLE AND INDIVISIBLE TASKS IN LOOSELY-COUPLED MULTIPROCESSOR SYSTEMS
BATAINEH, S
ALASIR, B
[J]. SOFTWARE ENGINEERING JOURNAL, 1994, 9 (01): : 13 - 18
[7] Design of an optimal loosely coupled heterogeneous multiprocessor system
Bender, A
[J]. EUROPEAN DESIGN & TEST CONFERENCE 1996 - ED&TC 96, PROCEEDINGS, 1996, : 275 - 281
[8] AN OBJECT-ORIENTED INTERFACE FOR PARALLEL PROGRAMMING OF LOOSELY-COUPLED MULTIPROCESSOR SYSTEMS
UNGERER, T
BIC, L
[J]. LECTURE NOTES IN COMPUTER SCIENCE, 1991, 487 : 163 - 172
[9] Inter-core Prefetching for Multicore Processors Using Migrating Helper Threads
Kamruzzaman, Md
Swanson, Steven
Tullsen, Dean M.
[J]. ACM SIGPLAN NOTICES, 2011, 46 (03) : 393 - 404
[10] TIGHTLY AND LOOSELY COUPLED MU-C OPERATING-SYSTEMS LINK MULTIPROCESSOR HARDWARE
MARRIN, K
[J]. EDN MAGAZINE-ELECTRICAL DESIGN NEWS, 1985, 30 (05): : 79 - &

← 1 2 3 4 5 →