Fcd-cnn: FPGA-based CU depth decision for HEVC intra encoder using CNN

被引:2
|
作者
Dehnavi, Hossein [1 ]
Dehnavi, Mohammad [1 ]
Klidbary, Sajad Haghzad [2 ]
机构
[1] Kermanshah Univ Technol, Energy Fac, Dept Elect Engn, Kermanshah, Iran
[2] Univ Zanjan, Dept Elect & Comp Engn, Zanjan, Iran
关键词
FPGA; Video compression; Hardware architecture; HEVC;
D O I
10.1007/s11554-024-01487-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video compression for storage and transmission has always been a focal point for researchers in the field of image processing. Their efforts aim to reduce the data volume required for video representation while maintaining its quality. HEVC is one of the efficient standards for video compression, receiving special attention due to the increasing demand for high-resolution videos. The main step in video compression involves dividing the coding unit (CU) blocks into smaller blocks that have a uniform texture. In traditional methods, The Discrete Cosine Transform (DCT) is applied, followed by the use of RDO for decision-making on partitioning. This paper presents a novel convolutional neural network (CNN) and its hardware implementation as an alternative to DCT, aimed at speeding up partitioning and reducing the hardware resources required. The proposed hardware utilizes an efficient and lightweight CNN to partition CUs with low hardware resources in real-time applications. This CNN is trained for different Quantization Parameters (QPs) and block sizes to prevent overfitting. Furthermore, the system's input size is fixed at 16x16\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$16\times 16$$\end{document}, and other input sizes are scaled to this dimension. Loop unrolling, data reuse, and resource sharing are applied in hardware implementation to save resources. The hardware architecture is fixed for all block sizes and QPs, and only the coefficients of the CNN are changed. In terms of compression quality, the proposed hardware achieves a 4.42%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4.42\%$$\end{document} BD-BR and -0.19\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,0.19$$\end{document} BD-PSNR compared to HM16.5. The proposed system can process 64x64\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$64\times 64$$\end{document} CU at 150 MHz and in 4914 clock cycles. The hardware resources utilized by the proposed system include 13,141 LUTs, 15,885 Flip-flops, 51 BRAMs, and 74 DSPs.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Texture-classification Accelerated CNN Scheme for Fast Intra CU Partition in HEVC
    Zhang, Yongfei
    Wang, Gang
    Tian, Rui
    Xu, Mai
    Kuo, C. C. Jay
    2019 DATA COMPRESSION CONFERENCE (DCC), 2019, : 241 - 249
  • [32] CNN-based fast HEVC quantization parameter mode decision
    Chen L.
    Wang B.
    Yu W.
    Fan X.
    Computers, Materials and Continua, 2020, 61 (03): : 115 - 126
  • [33] FitNN: A Low-Resource FPGA-Based CNN Accelerator for Drones
    Zhang, Zhichao
    Mahmud, M. A. Parvez
    Kouzani, Abbas Z.
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (21) : 21357 - 21369
  • [34] An Accelerated FPGA-Based Parallel CNN-LSTM Computing Device
    Zhou, Xin
    Xie, Wei
    Zhou, Han
    Cheng, Yongjing
    Wang, Ximing
    Ren, Yun
    Yuan, Shandong
    Li, Liuwen
    IEEE ACCESS, 2024, 12 : 106579 - 106592
  • [35] FPGA-Based CNN for Eye Detection in an Iris Recognition at a Distance System
    Ruiz-Beltran, Camilo A.
    Romero-Garces, Adrian
    Gonzalez-Garcia, Martin
    Marfil, Rebeca
    Bandera, Antonio
    ELECTRONICS, 2023, 12 (22)
  • [36] FxHENN: FPGA-based acceleration framework for homomorphic encrypted CNN inference
    Zhu, Yilan
    Wang, Xinyao
    Ju, Lei
    Guo, Shanqing
    2023 IEEE INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE COMPUTER ARCHITECTURE, HPCA, 2023, : 896 - 907
  • [37] Increasing Flexibility of FPGA-based CNN Accelerators with Dynamic Partial Reconfiguration
    Irmak, Hasan
    Ziener, Daniel
    Alachiotis, Nikolaos
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 306 - 311
  • [38] FPGA-Based CNN for Real-Time UAV Tracking and Detection
    Hobden, Peter
    Srivastava, Saket
    Nurellari, Edmond
    FRONTIERS IN SPACE TECHNOLOGIES, 2022, 3
  • [39] High performance implementation of an FPGA-based sequential DT-CNN
    Javier Martinez-Alvarez, J.
    Javier Garrigos-Guerrero, F.
    Javier Toledo-Moreo, F.
    Manuel Ferrandez-Vicente, J.
    NATURE INSPIRED PROBLEM-SOLVING METHODS IN KNOWLEDGE ENGINEERING, PT 2, PROCEEDINGS, 2007, 4528 : 1 - +
  • [40] A CNN-Based In-Loop Filter with CU Classification for HEVC
    Dai, Yuanying
    Liu, Dong
    Zha, Zheng-Jun
    Wu, Feng
    2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,