GPU-Optimized Volume Ray Tracing for Massive Numbers of Rays in Radiotherapy

被引:1
|
作者
Zhou, Bo [1 ,2 ]
Xiao, Kai [3 ]
Chen, Danny Z. [3 ]
Hu, X. Sharon [3 ]
机构
[1] Univ Maryland, Sch Med, Dept Radiat Oncol, College Pk, MD 20742 USA
[2] Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China
[3] Univ Notre Dame, Dept Comp Sci & Engn, Notre Dame, IN 46556 USA
关键词
Design; Algorithms; Performance; Graphics processing unit; ray tracing; application specific design; application acceleration; DOSE CALCULATION; IMPLEMENTATION; CONVOLUTION; IMRT;
D O I
10.1145/2539036.2539038
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Ray tracing within a uniform grid volume is a fundamental process invoked frequently by many applications, especially radiation-dose calculation methods in radiotherapy. However, the conflicting features between the GPU memory architecture and the memory-accessing patterns of volume ray tracing lead to inefficient usage of GPU memory bandwidth and waste of capability of modern GPUs. To improve the ray tracing performance on GPU, we propose a lookup-table-based ray tracing method which is specially optimized towards the GPU memory system for processing a massive number of rays. The proposed method is based on a key observation that many of these applications normally involves a massive number of rays, but their ray tracing may not need to follow a specific execution order. Therefore, we divide the 3D space into many regions (called pyramids) and group together the rays falling into the same pyramid. For each ray group, the volume is rotated and resampled for their raytracing. This divide-and-rotate strategy allows the memory access of the ray tracing process to adopt a table-lookup approach and leads to better memory coalescing on GPU. Our proposed method was thoroughly evaluated in four volume setups with randomly-generated rays. The collapsed-cone convolution/superposition (CCCS) dose calculation method is also implemented with/without the proposed approach to verify the feasibility of our method. Compared with the direct GPU implementation of the popular 3DDDA algorithm, our method provides a speedup in the range of 1.91-2.94X for the volume settings we used. Major performance factors, including ray origins, volume size, and pyramid size, are also analyzed. The proposed technique was also found to be able to give a speedup of 1.61-2.17X over the original GPU implementation of the CCCS algorithm. Our experiment results indicate that the proposed approach is capable of offering better coalesced memory access which eventually boosts the raytracing performance on GPU. Moreover, our approach is conceptually simple and can be readily included into various applications.
引用
收藏
页数:17
相关论文
共 6 条
  • [1] Accelerated ray tracing for radiotherapy dose calculations on a GPU
    de Greef, M.
    Crezee, J.
    van Eijk, J. C.
    Pool, R.
    Bel, A.
    MEDICAL PHYSICS, 2009, 36 (09) : 4095 - 4102
  • [2] Fast GPU perspective grid construction and triangle tracing for exhaustive ray tracing of highly coherent rays
    Perrotte, Lancelot
    Saupin, Guillaume
    INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2012, 26 (03): : 192 - 202
  • [3] Shallow bounding volume hierarchies for fast SIMD ray tracing of incoherent rays
    Dammertz, H.
    Hanika, J.
    Keller, A.
    COMPUTER GRAPHICS FORUM, 2008, 27 (04) : 1225 - 1233
  • [4] Performance Comparison of Bounding Volume Hierarchies and Kd-Trees for GPU Ray Tracing
    Vinkler, Marek
    Havran, Vlastimil
    Bittner, Jiri
    COMPUTER GRAPHICS FORUM, 2016, 35 (08) : 68 - 79
  • [5] A scalable plant-resolving radiative transfer model based on optimized GPU ray tracing
    Bailey, B. N.
    Overby, M.
    Willemsen, P.
    Pardyjak, E. R.
    Mahaffee, W. F.
    Stoll, R.
    AGRICULTURAL AND FOREST METEOROLOGY, 2014, 198 : 192 - 208
  • [6] Optimized GPU-Accelerated Framework for X-ray Rendering Using k-space Volume Reconstruction
    Abdellah, Marwan
    Amer, Yassin
    Eldeib, Ayman
    XIV MEDITERRANEAN CONFERENCE ON MEDICAL AND BIOLOGICAL ENGINEERING AND COMPUTING 2016, 2016, 57 : 372 - 377