LD: Low-Overhead GPU Race Detection Without Access Monitoring

被引:10
|
作者
Li, Pengcheng [1 ]
Hu, Xiaoyu [1 ]
Chen, Dong [1 ]
Brock, Jacob [1 ]
Luo, Hao [1 ]
Zhang, Eddy Z. [2 ]
Ding, Chen [1 ]
机构
[1] Univ Rochester, POB 270226,CSB Bldg, Rochester, NY 14627 USA
[2] Rutgers State Univ, Dept Comp Sci, 110 Frelinghuysen Rd, Piscataway, NJ 08854 USA
基金
美国国家科学基金会;
关键词
GPU race detection; low overhead; value-based checking; instrumentation-free;
D O I
10.1145/3046678
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Data race detection has become an important problem in GPU programming. Previous designs of CPU racechecking tools are mainly task parallel and incur high overhead on GPUs due to access instrumentation, especially when monitoring many thousands of threads routinely used by GPU programs. This article presents a novel data-parallel solution designed and optimized for the GPU architecture. It includes compiler support and a set of runtime techniques. It uses value-based checking, which detects the races reported in previous work, finds new races, and supports race-free deterministic GPU execution. More important, race checking is massively data parallel and does not introduce divergent branching or atomic synchronization. Its slowdown is less than 5x for over half of the tests and 10x on average, which is orders of magnitude more efficient than the cuda-memcheck tool by Nvidia and the methods that use fine-grained access instrumentation.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] GS-DMR: Low-overhead soft error detection scheme for stencil-based computation
    Ren Xiaoguang
    Xu Xinhai
    Wang Qian
    Chen Juan
    Wang Miao
    Yang Xuejun
    [J]. PARALLEL COMPUTING, 2015, 41 : 50 - 65
  • [42] A low-overhead error detection and correction technique with a relaxed error timing constraint for variation-tolerance
    Zhu, Zhi-jiu
    Yu, Yi
    Bai, Xu
    Qiao, Shu-shan
    Hei, Yong
    [J]. IEICE ELECTRONICS EXPRESS, 2019, 16 (14) : 1 - 4
  • [43] A Low-Overhead Timing Monitoring Technique for Variation-Tolerant Near-Threshold Digital Integrated Circuits
    Shan, Weiwei
    Liu, Xinning
    Lu, Minyi
    Wan, Liang
    Yang, Jun
    [J]. IEEE ACCESS, 2018, 6 : 138 - 145
  • [44] Mahout: Low-Overhead Datacenter Traffic Management using End-Host-Based Elephant Detection
    Curtis, Andrew R.
    Kim, Wonho
    Yalagandula, Praveen
    [J]. 2011 PROCEEDINGS IEEE INFOCOM, 2011, : 1629 - 1637
  • [45] All Digital Low-Overhead SAR ADC Built-In Self-Test for Fault Detection and Diagnosis
    Ganji, Mona
    Saikiran, Marampally
    Chen, Degang
    [J]. 2022 IEEE 40TH VLSI TEST SYMPOSIUM (VTS), 2022,
  • [46] Low-overhead Hardware Supervision for Securing an IoT Bluetooth-enabled Device: Monitoring Radio Frequency and Supply Voltage
    Elkanishy, Abdelrahman
    Furth, Paul M.
    Rivera, Derrick T.
    Badawy, Abdel-Hameed A.
    [J]. ACM JOURNAL ON EMERGING TECHNOLOGIES IN COMPUTING SYSTEMS, 2022, 18 (01)
  • [47] Entropy-driven parity-tree selection for low-overhead concurrent error detection in finite state machines
    Almukhaizim, Sobeeh
    Drineas, Petros
    Makris, Yiorgos
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2006, 25 (08) : 1547 - 1554
  • [48] Low-overhead and High-accuracy Failure Detection Method for Wireless Multi-hop ad hoc Networks
    Di, Xin
    Zhang, Zhaoyu
    Li, Hongchun
    Ao, Chen
    Tian, Jun
    Ozaki, Kazuyuki
    Wen, Yun
    Fujita, Hiroshi
    [J]. 2014 INTERNATIONAL WIRELESS COMMUNICATIONS AND MOBILE COMPUTING CONFERENCE (IWCMC), 2014, : 80 - 85
  • [49] Efficient detection of transistor stuck-on faults in CMOS circuits using low-overhead single-ended ring oscillators
    S. M. Ishraqul Huq
    Apratim Roy
    Mushfiqul Ahmed
    Ayman Uddin Mahin
    [J]. Journal of Computational Electronics, 2020, 19 : 1685 - 1694
  • [50] Efficient detection of transistor stuck-on faults in CMOS circuits using low-overhead single-ended ring oscillators
    Huq, S. M. Ishraqul
    Roy, Apratim
    Ahmed, Mushfiqul
    Mahin, Ayman Uddin
    [J]. JOURNAL OF COMPUTATIONAL ELECTRONICS, 2020, 19 (04) : 1685 - 1694