Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

被引:14
|
作者
Su, Huayou [1 ]
Wen, Mei [1 ]
Wu, Nan [1 ]
Ren, Ju [1 ]
Zhang, Chunyuan [1 ]
机构
[1] Natl Univ Def Technol, Sch Comp Sci & Sci & Technol, Parallel & Distributed Proc Lab, Changsha 410073, Hunan, Peoples R China
来源
基金
国家高技术研究发展计划(863计划);
关键词
ALGORITHM; DESIGN;
D O I
10.1155/2014/716020
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] A Highly Efficient GPU-CPU Hybrid Parallel Implementation of Sparse LU Factorization
    Liu Li
    Liu Li
    Yang Guangwen
    [J]. CHINESE JOURNAL OF ELECTRONICS, 2012, 21 (01) : 7 - 12
  • [42] An Efficient Graph Isomorphism Algorithm Based on Canonical Labeling and Its Parallel Implementation on GPU
    Wang, Renda
    Guo, Longjiang
    Ai, Chunyu
    Li, Jinbao
    Ren, Meirui
    Li, Keqin
    [J]. 2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 1089 - 1096
  • [43] An Efficient GPU Implementation of CKY Parsing Using the Bitwise Parallel Bulk Computation Technique
    Fujita, Toru
    Nakano, Koji
    Ito, Yasuaki
    Takafuji, Daisuke
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (12): : 2857 - 2865
  • [44] ADVANCED IMPLEMENTATION OF THE FULL RESOLUTION P-SBAS DINSAR PROCESSING CHAIN BASED ON SCALABLE GPU-PARALLEL TECHNIQUES FOR THE EFFICIENT DEFORMATIONS ANALYSIS OF THE BUILT-UP ENVIRONMENT
    Bonano, Manuela
    Buonanno, Sabatino
    Lanari, Riccardo
    Manunta, Michele
    Striano, Pasquale
    Yasir, Muhammad
    Zinno, Ivana
    [J]. 2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1103 - 1106
  • [45] Tools and Techniques for Implementation of Real-time Video Processing Algorithms
    Levent, Vecdi Emre
    Guzel, Aydin E.
    Tosun, Mustafa
    Buyukmihci, Mert
    Aydin, Furkan
    Goren, Sezer
    Erbas, Cengiz
    Akgun, Toygar
    Ugurdag, H. Fatih
    [J]. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2019, 91 (01): : 93 - 113
  • [46] Tools and Techniques for Implementation of Real-time Video Processing Algorithms
    Vecdi Emre Levent
    Aydin E. Guzel
    Mustafa Tosun
    Mert Buyukmihci
    Furkan Aydin
    Sezer Gören
    Cengiz Erbas
    Toygar Akgün
    H. Fatih Ugurdag
    [J]. Journal of Signal Processing Systems, 2019, 91 : 93 - 113
  • [47] Efficient Parallel Processing of All-Pairs Shortest Paths on Multicore and GPU Systems
    Alghamdi, Mohammed H.
    He, Ligang
    Ren, Shenyuan
    Maray, Mohammed
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 2896 - 2908
  • [48] Parallel processing for image and video processing
    AXIS, Institut d'Electronique Fondamentale, Université Paris-Sud, Bâtiment 220, 91405 Orsay Cedex, France
    不详
    [J]. Parallel Comput, 2008, 12 (693):
  • [49] AN IMPLEMENTATION OF PARALLEL PROCESSING
    TOWNSEND, HRA
    [J]. COMPUTER JOURNAL, 1987, 30 (02): : 191 - 191
  • [50] An Optimized Message Passing Framework for Parallel Implementation of Signal Processing Applications
    Saha, Sankalita
    Schlessman, Jason
    Puthenpurayil, Sebastian
    Bhattacharyya, Shuvra S.
    Wolf, Wayne
    [J]. 2008 DESIGN, AUTOMATION AND TEST IN EUROPE, VOLS 1-3, 2008, : 1062 - +