Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

被引:14
|
作者
Su, Huayou [1 ]
Wen, Mei [1 ]
Wu, Nan [1 ]
Ren, Ju [1 ]
Zhang, Chunyuan [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci & Sci & Technol, Parallel & Distributed Proc Lab, Changsha 410073, Hunan, Peoples R China
来源
基金
国家高技术研究发展计划(863计划);
关键词
ALGORITHM; DESIGN;
D O I
10.1155/2014/716020
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] PARALLEL PROCESSING OF DCT ON GPU
    Tokdemir, Serpil
    Belkasim, S.
    [J]. 2011 DATA COMPRESSION CONFERENCE (DCC), 2011, : 479 - 479
  • [22] Implementation of a Parallel GPU-Based Space-Time Kriging Framework
    Zhang, Yueheng
    Zheng, Xinqi
    Wang, Zhenhua
    Ai, Gang
    Huang, Qing
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2018, 7 (05)
  • [23] Design and Implementation of Multi Agent Simulation Library MasCUDA for Parallel Processing with GPU
    Ohiwa, Akira
    Haga, Hirohide
    [J]. PROCEEDINGS OF 2019 2ND INTERNATIONAL CONFERENCE ON ELECTRONICS AND ELECTRICAL ENGINEERING TECHNOLOGY (EEET 2019), 2019, : 13 - 18
  • [24] Generic framework for parallel and distributed processing of video-data
    Farin, Dirk
    de With, Peter H. N.
    [J]. FRONTIERS OF HIGH PERFORMANCE COMPUTING AND NETWORKING - ISPA 2006 WORKSHOPS, PROCEEDINGS, 2006, 4331 : 823 - +
  • [25] FEVES: Framework for Efficient Parallel Video Encoding on Heterogeneous Systems
    Ilic, Aleksandar
    Momcilovic, Svetislav
    Roma, Nuno
    Sousa, Leonel
    [J]. 2014 43RD INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING (ICPP), 2014, : 20 - 29
  • [26] The design and implementation of image parallel processing framework based on Hadoop
    Wang, Shenkuo
    Wu, Shaofei
    Zhang, Huajie
    Xia, Ning
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2020, 126 : 183 - 183
  • [27] A Parallel Error Diffusion Implementation on a GPU
    Zhang, Yao
    Recker, John Ludd
    Ulichney, Robert
    Beretta, Giordano B.
    Tastl, Ingeborg
    Lin, I-Jong
    Owens, John D.
    [J]. PARALLEL PROCESSING FOR IMAGING APPLICATIONS, 2011, 7872
  • [28] Implementation of a parallel tree method on a GPU
    Nakasato, Naohito
    [J]. JOURNAL OF COMPUTATIONAL SCIENCE, 2012, 3 (03) : 132 - 141
  • [29] OmniDB: Towards Portable and Efficient Query Processing on Parallel CPU/GPU Architectures
    Zhang, Shuhao
    He, Jiong
    He, Bingsheng
    Lu, Mian
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2013, 6 (12): : 1374 - 1377
  • [30] A parallel Bees Algorithm implementation on GPU
    Luo, Guo-Heng
    Huang, Sheng-Kai
    Chang, Yue-Shan
    Yuan, Shyan-Ming
    [J]. JOURNAL OF SYSTEMS ARCHITECTURE, 2014, 60 (03) : 271 - 279