Efficient Parallel Framework for HEVC Motion Estimation on Many-Core Processors

被引:367
|
作者
Yan, Chenggang [1 ,2 ]
Zhang, Yongdong [1 ]
Xu, Jizheng [3 ]
Dai, Feng [1 ]
Zhang, Jun [1 ]
Dai, Qionghai [2 ]
Wu, Feng [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100190, Peoples R China
[2] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[3] Microsoft Res Asia, Beijing 100190, Peoples R China
关键词
Coding efficiency; degree of parallelism (DP); efficient parallel framework; High Efficiency Video Coding (HEVC); many-core processors; motion estimation (ME); HIGHLY PARALLEL; DEBLOCKING FILTER; VIDEO; COMPLEXITY; DECISION;
D O I
10.1109/TCSVT.2014.2335852
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
High Efficiency Video Coding (HEVC) provides superior coding efficiency than previous video coding standards at the cost of increasing encoding complexity. The complexity increase of motion estimation (ME) procedure is rather significant, especially when considering the complicated partitioning structure of HEVC. To fully exploit the coding efficiency brought by HEVC requires a huge amount of computations. In this paper, we analyze the ME structure in HEVC and propose a parallel framework to decouple ME for different partitions on many-core processors. Based on local parallel method (LPM), we first use the directed acyclic graph (DAG)-based order to parallelize coding tree units (CTUs) and adopt improved LPM (ILPM) within each CTU (DAGILPM), which exploits the CTU-level and prediction unit (PU)-level parallelism. Then, we find that there exist completely independent PUs (CIPUs) and partially independent PUs (PIPUs). When the degree of parallelism (DP) is smaller than the maximum DP of DAGILPM, we process the CIPUs and PIPUs, which further increases the DP. The data dependencies and coding efficiency stay the same as LPM. Experiments show that on a 64-core system, compared with serial execution, our proposed scheme achieves more than 30 and 40 times speedup for 1920 x 1080 and 2560 x 1600 video sequences, respectively.
引用
收藏
页码:2077 / 2089
页数:13
相关论文
共 50 条
  • [1] Highly Parallel Framework for HEVC Motion Estimation on Many-core Platform
    Yan, Chenggang
    Zhang, Yongdong
    Dai, Feng
    Li, Liang
    [J]. 2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 63 - 72
  • [2] Efficient Parallel Framework for HEVC Deblocking Filter on Many-core Platform
    Yan, Chenggang
    Zhang, Yongdong
    Dai, Feng
    Li, Liang
    [J]. 2013 DATA COMPRESSION CONFERENCE (DCC), 2013, : 530 - 530
  • [3] A Highly Parallel Framework for HEVC Coding Unit Partitioning Tree Decision on Many-core Processors
    Yan, Chenggang
    Zhang, Yongdong
    Xu, Jizheng
    Dai, Feng
    Li, Liang
    Dai, Qionghai
    Wu, Feng
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2014, 21 (05) : 573 - 576
  • [4] Efficient parallel HEVC intra-prediction on many-core processor
    Yan, C.
    Zhang, Y.
    Dai, F.
    Zhang, J.
    Li, L.
    Dai, Q.
    [J]. ELECTRONICS LETTERS, 2014, 50 (11) : 805 - U53
  • [5] Many-Core HEVC Encoding Based on Wavefront Parallel Processing and GPU-accelerated Motion Estimation
    Radicke, Stefan
    Hahn, Jens-Uwe
    Wang, Qi
    Grecos, Christos
    [J]. E-BUSINESS AND TELECOMMUNICATIONS, ICETE 2014, 2015, 554 : 393 - 417
  • [6] Parallel deblocking filter for HEVC on many-core processor
    Yan, Chenggang
    Zhang, Yongdong
    Dai, Feng
    Wang, Xi
    Li, Liang
    Dai, Qionghai
    [J]. ELECTRONICS LETTERS, 2014, 50 (05) : 367 - +
  • [7] Efficient Fault Simulation on Many-Core Processors
    Kochte, Michael A.
    Schaal, Marcel
    Wunderlich, Hans-Joachim
    Zoellin, Christian G.
    [J]. PROCEEDINGS OF THE 47TH DESIGN AUTOMATION CONFERENCE, 2010, : 380 - 385
  • [8] Parallelizing Compilation Framework for Heterogeneous Many-core Processors
    Li, Yan-Bing
    Zhao, Rong-Cai
    Han, Lin
    Zhao, Jie
    Xu, Jin-Long
    Li, Ying-Ying
    [J]. Ruan Jian Xue Bao/Journal of Software, 2019, 30 (04): : 981 - 1001
  • [9] Parallel space saving on multi- and many-core processors
    Cafaro, Massimo
    Pulimeno, Marco
    Epicoco, Italo
    Aloisio, Giovanni
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2018, 30 (07):
  • [10] Reducing the burden of parallel loop schedulers for many-core processors
    Arif, Mahwish
    Vandierendonck, Hans
    [J]. CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (13):