Tailored AVX2 Transform Kernels for Versatile Video Coding

被引:0
|
作者
Siivonen, Kari [1 ]
Sainio, Joose [1 ]
Mercat, Alexandre [1 ]
Vanne, Jarno [1 ]
机构
[1] Tampere Univ, Ultra Video Grp, Tampere, Finland
基金
芬兰科学院;
关键词
Versatile Video Coding (VVC); transform; complexity reduction; Advanced Vector Extensions 2 (AVX2); practical encoder implementation;
D O I
10.1109/NorCAS58970.2023.10305449
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Transform coding tools play an integral part in video codecs due to their substantial impact on coding efficiency. The latest video coding standard, Versatile Video Coding (VVC), makes the most of these tools by introducing new DST7, DCT8, and non-square transforms alongside the conventional DCT2 transform. This paper proposes optimized AVX2 kernels for all these transforms to speed up VVC coding. Unlike existing solutions, our kernels are specially tailored for each VVC transform type and block size. Accelerating our open-source uvg266 VVC encoder with the proposed kernels yields up to a 1.1x speedup under all intra (AI) coding condition without any coding overhead. Our implementations make forward DCT2 and DST7/DCT8 transforms 4.0x and 6.7x as fast as their respective scalar implementations in the VTM reference encoder. They also outpace the AVX2 kernels of the practical VVenC encoder by factors of 3.0x and 2.8x. The respective speedups rise up to 5.3x, 11.1x, 3.4x, and 3.0x with inverse transforms.
引用
收藏
页数:6
相关论文
共 50 条
  • [31] Speed Records for Multi-prime RSA Using AVX2 Architectures
    Gueron, Shay
    Krasnov, Vlad
    INFORMATION TECHNOLOGY: NEW GENERATIONS, 2016, 448 : 237 - 245
  • [32] SIMD IMPLEMENTATION OF THE AHO-CORASICK ALGORITHM USING INTEL AVX2
    Lazhar, Ourlis
    Djamel, Bellala
    SCALABLE COMPUTING-PRACTICE AND EXPERIENCE, 2019, 20 (03): : 563 - 576
  • [33] Faster Base64 Encoding and Decoding Using AVX2 Instructions
    Mula, Wojciech
    Lemire, Daniel
    ACM TRANSACTIONS ON THE WEB, 2018, 12 (03)
  • [34] A Hardware Design for the Multi-Transform Module of the Versatile Video Coding Standard
    Silveira, Bianca
    Palomino, Daniel
    Diniz, Claudio
    Correa, Guilherme
    2023 36TH SBC/SBMICRO/IEEE/ACM SYMPOSIUM ON INTEGRATED CIRCUITS AND SYSTEMS DESIGN, SBCCI, 2023, : 23 - 28
  • [35] Subjective evaluation of approximate Discrete Sine Transform for the Versatile Video Coding standard
    Ben Jdidia, Sonda
    Ben Amor, Mohamed
    Belghith, Fatma
    Masmoudi, Nouri
    2020 5TH INTERNATIONAL CONFERENCE ON ADVANCED TECHNOLOGIES FOR SIGNAL AND IMAGE PROCESSING (ATSIP'2020), 2020,
  • [36] QCKer: An x86-AVX/AVX2 Implementation of Q-gram Counting Filter for DNA Sequence Alignment
    Pernez, Joven L., Jr.
    Borja, Kaizen Vinz A.
    Uy, Roger Luis
    Maghirang, Jan Carlo G.
    PROCEEDINGS OF 2019 6TH INTERNATIONAL CONFERENCE BIOINFORMATICS RESEARCH AND APPLICATIONS (ICBRA 2019), 2019, : 49 - 54
  • [37] An Exploration of Using the Intel AVX2 Gather Load Instructions for Vectorised Image Processing
    Cree, Michael J.
    2018 INTERNATIONAL CONFERENCE ON IMAGE AND VISION COMPUTING NEW ZEALAND (IVCNZ), 2018,
  • [38] Parallel Implementation of SM2 Elliptic Curve Cryptography on Intel Processors with AVX2
    Huang, Junhao
    Liu, Zhe
    Hu, Zhi
    Grossschadl, Johann
    INFORMATION SECURITY AND PRIVACY, ACISP 2020, 2020, 12248 : 204 - 224
  • [39] High-Throughput Elliptic Curve Cryptography Using AVX2 Vector Instructions
    Cheng, Hao
    Grossschaedl, Johann
    Tian, Jiaqi
    Ronne, Peter B.
    Ryan, Peter Y. A.
    SELECTED AREAS IN CRYPTOGRAPHY, 2021, 12804 : 698 - 719
  • [40] USING AVX2 INSTRUCTION SET TO INCREASE PERFORMANCE OF HIGH PERFORMANCE COMPUTING CODE
    Gepner, Pawel
    COMPUTING AND INFORMATICS, 2017, 36 (05) : 1001 - 1018