GRace: A Low-Overhead Mechanism for Detecting Data Races in GPU Programs

被引:32
|
作者
Zheng, Mai [1 ]
Ravi, Vignesh T. [1 ]
Qin, Feng [1 ]
Agrawal, Gagan [1 ]
机构
[1] Ohio State Univ, Dept Comp Sci & Engn, Columbus, OH 43210 USA
关键词
CUDA; Concurrency; Data Race; GPU; Multithreading; Algorithm; Design; Reliability;
D O I
10.1145/2038037.1941574
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In recent years, GPUs have emerged as an extremely cost-effective means for achieving high performance. Many application developers, including those with no prior parallel programming experience, are now trying to scale their applications using GPUs. While languages like CUDA and OpenCL have eased GPU programming for non-graphical applications, they are still explicitly parallel languages. All parallel programmers, particularly the novices, need tools that can help ensuring the correctness of their programs. Like any multithreaded environment, data races on GPUs can severely affect the program reliability. Thus, tool support for detecting race conditions can significantly benefit GPU application developers. Existing approaches for detecting data races on CPUs or CPUs have one or more of the following limitations: 1) being ill-suited for handling non-lock synchronization primitives on GPUs; 2) lacking of scalability due to the state explosion problem; 3) reporting many false positives because of simplified modeling; and/or 4) incurring prohibitive runtime and space overhead. In this paper, we propose GRace, a new mechanism for detecting races in GPU programs that combines static analysis with a carefully designed dynamic checker for logging and analyzing information at runtime. Our design utilizes GPUs memory hierarchy to log runtime data accesses efficiently. To improve the performance, GRace leverages static analysis to reduce the number of statements that need to be instrumented. Additionally, by exploiting the knowledge of thread scheduling and the execution model in the underlying GPUs, GRace can accurately detect data races with no false positives reported. Based on the above idea, we have built a prototype of GRace with two schemes, i.e., GRace-stmt and GRace-addr, for NVIDIA GPUs. Both schemes are integrated with the same static analysis. We have evaluated GRace-stmt and GRace-addr with three data race bugs in three GPU kernel functions and also have compared them with the existing approach, referred to as B-tool. Our experimental results show that both schemes of GRace are effective in detecting all evaluated cases with no false positives, whereas B-tool reports many false positives for one evaluated case. On the one hand, GRace-addr incurs low runtime overhead, i.e., 22-116%, and low space overhead, i.e., 9-18 MB, for the evaluated kernels. On the other hand, GRace-stmt offers more help in diagnosing data races with larger overhead.
引用
收藏
页码:135 / 145
页数:11
相关论文
共 50 条
  • [21] Low-Overhead Implementation of a Soft Decision Helper Data Algorithm for SRAM PUFs
    Maes, Roel
    Tuyls, Pim
    Verbauwhede, Ingrid
    [J]. CRYPTOGRAPHIC HARDWARE AND EMBEDDED SYSTEMS - CHES 2009, PROCEEDINGS, 2009, 5747 : 332 - 347
  • [22] Towards Providing Low-Overhead Data Race Detection for Large OpenMP Applications
    Protze, Joachim
    Atzeni, Simone
    Ahn, Dong H.
    Schulz, Martin
    Gopalakrishnan, Ganesh
    Mueller, Matthias S.
    Laguna, Ignacio
    Rakamaric, Zvonimir
    Lee, Greg L.
    [J]. PROCEEDINGS OF LLVM-HPC 14 2014 LLVM COMPILER INFRASTRUCTURE IN HPC, 2014, : 40 - 47
  • [23] A LOW-OVERHEAD LABORATORY DATA MANAGEMENT-SYSTEM FOR THE PDP11
    STILLWELL, RN
    [J]. COMPUTERS AND BIOMEDICAL RESEARCH, 1982, 15 (01): : 29 - 38
  • [24] Data forwarding and update propagation in grid network for NDN: A low-overhead approach
    Chatterjee, Tanusree
    Ruj, Sushmita
    DasBit, Sipra
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ADVANCED NETWORKS AND TELECOMMUNICATIONS SYSTEMS (ANTS), 2018,
  • [25] A Low-Overhead, Confidentiality-Assured, and Authenticated Data Acquisition Framework for IoT
    Zhang, Yushu
    He, Qi
    Chen, Guo
    Zhang, Xinpeng
    Xiang, Yong
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (12) : 7566 - 7578
  • [26] Sound and Partially-Complete Static Analysis of Data-Races in GPU Programs
    Liew, Dennis
    Cogumbreiro, Tiago
    Lange, Julien
    [J]. Proceedings of the ACM on Programming Languages, 2024, 8 (OOPSLA2)
  • [27] On Low-Overhead and Stable Data Transmission between Channel-Hopping Cognitive Radios
    Wu, Ching-Chan
    Wu, Shan-Hung
    Chen, Wen-Tsuen
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2017, 16 (09) : 2574 - 2587
  • [28] Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching Acceleration
    Sadredini, Elaheh
    Rahimi, Reza
    Imani, Mohsen
    Skadron, Kevin
    [J]. PROCEEDINGS OF 54TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE, MICRO 2021, 2021, : 311 - 323
  • [29] Combining Deduplication and Delta Compression to Achieve Low-Overhead Data Reduction on Backup Datasets
    Xia, Wen
    Jiang, Hong
    Feng, Dan
    Tian, Lei
    [J]. 2014 DATA COMPRESSION CONFERENCE (DCC 2014), 2014, : 203 - 212
  • [30] DSLR-: A low-overhead data structure layout randomization for defending data-oriented programming
    Wei, Jin
    Chen, Ping
    [J]. JOURNAL OF COMPUTER SECURITY, 2024, 32 (03) : 221 - 246