GRace: A Low-Overhead Mechanism for Detecting Data Races in GPU Programs

被引：32

作者：

Zheng, Mai ^{[1
]}

Ravi, Vignesh T. ^{[1
]}

Qin, Feng ^{[1
]}

Agrawal, Gagan ^{[1
]}

机构：

[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA

来源：

ACM SIGPLAN NOTICES | 2011年 / 46卷 / 08期

关键词：

CUDA; Concurrency; Data Race; GPU; Multithreading; Algorithm; Design; Reliability;

D O I：

10.1145/2038037.1941574

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. Many application developers, including those with no prior parallel programming experience, are now trying to scale their applications using GPUs. While languages like CUDA and OpenCL have eased GPU programming for non-graphical applications, they are still explicitly parallel languages. All parallel programmers, particularly the novices, need tools that can help ensuring the correctness of their programs. Like any multithreaded environment, data races on GPUs can severely affect the program reliability. Thus, tool support for detecting race conditions can significantly benefit GPU application developers. Existing approaches for detecting data races on CPUs or CPUs have one or more of the following limitations: 1) being ill-suited for handling non-lock synchronization primitives on GPUs; 2) lacking of scalability due to the state explosion problem; 3) reporting many false positives because of simplified modeling; and/or 4) incurring prohibitive runtime and space overhead. In this paper, we propose GRace, a new mechanism for detecting races in GPU programs that combines static analysis with a carefully designed dynamic checker for logging and analyzing information at runtime. Our design utilizes GPUs memory hierarchy to log runtime data accesses efficiently. To improve the performance, GRace leverages static analysis to reduce the number of statements that need to be instrumented. Additionally, by exploiting the knowledge of thread scheduling and the execution model in the underlying GPUs, GRace can accurately detect data races with no false positives reported. Based on the above idea, we have built a prototype of GRace with two schemes, i.e., GRace-stmt and GRace-addr, for NVIDIA GPUs. Both schemes are integrated with the same static analysis. We have evaluated GRace-stmt and GRace-addr with three data race bugs in three GPU kernel functions and also have compared them with the existing approach, referred to as B-tool. Our experimental results show that both schemes of GRace are effective in detecting all evaluated cases with no false positives, whereas B-tool reports many false positives for one evaluated case. On the one hand, GRace-addr incurs low runtime overhead, i.e., 22-116%, and low space overhead, i.e., 9-18 MB, for the evaluated kernels. On the other hand, GRace-stmt offers more help in diagnosing data races with larger overhead.

引用

页码：135 / 145

页数：11

共 50 条

[1] GMRace: Detecting Data Races in GPU Programs via a Low-Overhead Scheme
Zheng, Mai
Ravi, Vignesh T.
Qin, Feng
Agrawal, Gagan
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2014, 25 (01) : 104 - 115
[2] GMProf: A Low-Overhead, Fine-Grained Profiling Approach for GPU Programs
Zheng, Mai
Ravi, Vignesh T.
Ma, Wenjing
Qin, Feng
Agrawal, Gagan
[J]. 2012 19TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2012,
[3] LoGA: Low-overhead GPU accounting using events
Kehne, Jens
Spassov, Stanislav
Hillenbrand, Marius
Rittinghaus, Marc
Bellosa, Frank
[J]. SYSTOR'17: PROCEEDINGS OF THE 10TH ACM INTERNATIONAL SYSTEMS AND STORAGE CONFERENCE, 2017,
[4] Low-Overhead Trace Collection and Profiling on GPU Compute Kernels
Darche, Sebastien
Dagenais, Michel R.
[J]. ACM TRANSACTIONS ON PARALLEL COMPUTING, 2024, 11 (02)
[5] TripleID: A Low-Overhead Representation and Querying Using GPU for Large RDFs
Chantrapornchai, Chantana
Choksuchat, Chidchanok
Haidl, Michael
Gorlatch, Sergei
[J]. BEYOND DATABASES, ARCHITECTURES AND STRUCTURES, BDAS 2016, 2016, 613 : 400 - 415
[6] LD: Low-Overhead GPU Race Detection Without Access Monitoring
Li, Pengcheng
Hu, Xiaoyu
Chen, Dong
Brock, Jacob
Luo, Hao
Zhang, Eddy Z.
Ding, Chen
[J]. ACM TRANSACTIONS ON ARCHITECTURE AND CODE OPTIMIZATION, 2017, 14 (01)
[7] Detecting data races in sequential programs with DIOTA
Ronsse, M
Maebe, J
De Bosschere, K
[J]. EURO-PAR 2004 PARALLEL PROCESSING, PROCEEDINGS, 2004, 3149 : 82 - 89
[8] A Low-Overhead Integrity Verification for Big Data Transfers
Arslan, Engin
Alhussen, Ahmed
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2018, : 4227 - 4236
[9] Low-overhead dynamic sharing of graphics memory space in GPU virtualization environments
Minwoo Gu
Younghun Park
Youngjae Kim
Sungyong Park
[J]. Cluster Computing, 2020, 23 : 2167 - 2178
[10] Low-overhead dynamic sharing of graphics memory space in GPU virtualization environments
Gu, Minwoo
Park, Younghun
Kim, Youngjae
Park, Sungyong
[J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2020, 23 (03): : 2167 - 2178

← 1 2 3 4 5 →