Acceleration of large-scale CGH generation using multi-GPU cluster

被引：1

作者：

Watanabe, Shinpei ^{[1
]}

Jackin, Boaz Jessie ^{[2
]}

Ohkawa, Takeshi ^{[1
]}

Ootsu, Kanemitsu ^{[1
]}

Yokota, Takashi ^{[1
]}

Hayasaki, Yoshio ^{[3
]}

Yatagai, Toyohiko ^{[3
]}

Baba, Takanobu ^{[3
]}

机构：

[1] Utsunomiya Univ, Grad Sch Engn, Dept Informat Syst Sci, 7-1-2 Yoto, Utsunomiya, Tochigi 3218585, Japan

[2] Natl Inst Informat & Commun Technol, 4-2-1 Nukuikitamachi, Koganei, Tokyo 1848795, Japan

[3] Utsunomiya Univ, Ctr Opt Res & Educ, 7-1-2 Yoto, Utsunomiya, Tochigi 3218585, Japan

来源：

2017 FIFTH INTERNATIONAL SYMPOSIUM ON COMPUTING AND NETWORKING (CANDAR) | 2017年

关键词：

CGH; multi-GPU; cluster; object decomposition method; optimization;

D O I：

10.1109/CANDAR.2017.53

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Computer generated hologram (CGH) is a promising technology for realizing 3D displays. Large-scale CGH has an advantage that it resolves problems of existing 3D displays. However, the large-scale CGH generation requires a lot of memory space and computation time in proportion to pixel number. Further, in order to use CGH as a display, it needs to be generated in real time, and this is the reason why CGH does not suit to practical use. Computation of CGH is comprised of data-independent operations and current GPU has thousands of processing cores. Thus, acceleration of CGH generation can be expected by using GPU. To accelerate CGH generation processing, we adapt several parallelization and optimization techniques to the CGH program both for single node and multiple ones. The single node optimization techniques include the way of object decomposition, the reduction of data transfer amount between CPU and GPU, the kernel integration, stream processing, and the utilization of multi-GPU parallelism. The multi-node optimization includes inter-node data distribution method. The results show that we have achieved 134.7 times speed-up compared to sequential program execution by CPU.

引用

页码：589 / 593

页数：5

共 50 条

[31] Multi-GPU parallel acceleration scheme for meshfree peridynamic simulations
Wang, Xiaoming
Li, Shirui
Dong, Weijia
An, Boyang
Huang, Hong
He, Qing
Wang, Ping
Lv, Guanren
THEORETICAL AND APPLIED FRACTURE MECHANICS, 2024, 131
[32] Efficient Multi-GPU Memory Management for Deep Learning Acceleration
Kim, Youngrang
Lee, Jaehwan
Kim, Jik-Soo
Jei, Hyunseung
Roh, Hongchan
2018 IEEE 3RD INTERNATIONAL WORKSHOPS ON FOUNDATIONS AND APPLICATIONS OF SELF* SYSTEMS (FAS*W), 2018, : 37 - 43
[33] Fast STA Graph Partitioning Framework for Multi-GPU Acceleration
Guo, Guannan
Huang, Tsung-Wei
Wong, Martin
2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE, 2023,
[34] A multi-GPU acceleration for 3D imaging of the prostate
Attardo, E.A.
Borsic, A.
Halter, R.J.
Proceedings - 2011 International Conference on Electromagnetics in Advanced Applications, ICEAA'11, 2011, : 1096 - 1099
[35] Acceleration of Large-Scale FDTD Simulations on High Performance GPU Clusters
Ong, C.
Weldon, M.
Cyca, D.
Okoniewski, M.
2009 IEEE ANTENNAS AND PROPAGATION SOCIETY INTERNATIONAL SYMPOSIUM AND USNC/URSI NATIONAL RADIO SCIENCE MEETING, VOLS 1-6, 2009, : 545 - 548
[36] GPU Acceleration of Large-Scale Full-Frequency GW Calculations
Yu, Victor Wen-zhe
Govoni, Marco
JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2022, 18 (08) : 4690 - 4707
[37] A PCISPH implementation using distributed multi-GPU acceleration for simulating industrial engineering applications
Verma, Kevin
McCabe, Christopher
Peng, Chong
Wille, Robert
INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS, 2020, 34 (04): : 450 - 464
[38] CUSNTF: A Scalable Sparse Non-negative Tensor Factorization Model for Large-scale Industrial Applications on Multi-GPU
Li, Hao
Li, Kenli
An, Jiyao
Li, Keqin
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 1113 - 1122
[39] Divide et impera: Acceleration of DTI tractography using multi-GPU parallel processing
Lee, Jungsoo
Kim, Dae-Shik
INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2013, 23 (03) : 256 - 264
[40] An Efficient Parallelization Approach for Large-scale Sparse Non-negative Matrix Factorization Using Kullback-Leibler Divergence on Multi-GPU
Li, Hao
Li, Kenli
Peng, Jiwu
Hu, Junyan
Li, Keqin
2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON PARALLEL AND DISTRIBUTED PROCESSING WITH APPLICATIONS AND 2017 16TH IEEE INTERNATIONAL CONFERENCE ON UBIQUITOUS COMPUTING AND COMMUNICATIONS (ISPA/IUCC 2017), 2017, : 511 - 518

← 1 2 3 4 5 →