Fcd-cnn: FPGA-based CU depth decision for HEVC intra encoder using CNN

被引:2
|
作者
Dehnavi, Hossein [1 ]
Dehnavi, Mohammad [1 ]
Klidbary, Sajad Haghzad [2 ]
机构
[1] Kermanshah Univ Technol, Energy Fac, Dept Elect Engn, Kermanshah, Iran
[2] Univ Zanjan, Dept Elect & Comp Engn, Zanjan, Iran
关键词
FPGA; Video compression; Hardware architecture; HEVC;
D O I
10.1007/s11554-024-01487-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video compression for storage and transmission has always been a focal point for researchers in the field of image processing. Their efforts aim to reduce the data volume required for video representation while maintaining its quality. HEVC is one of the efficient standards for video compression, receiving special attention due to the increasing demand for high-resolution videos. The main step in video compression involves dividing the coding unit (CU) blocks into smaller blocks that have a uniform texture. In traditional methods, The Discrete Cosine Transform (DCT) is applied, followed by the use of RDO for decision-making on partitioning. This paper presents a novel convolutional neural network (CNN) and its hardware implementation as an alternative to DCT, aimed at speeding up partitioning and reducing the hardware resources required. The proposed hardware utilizes an efficient and lightweight CNN to partition CUs with low hardware resources in real-time applications. This CNN is trained for different Quantization Parameters (QPs) and block sizes to prevent overfitting. Furthermore, the system's input size is fixed at 16x16\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$16\times 16$$\end{document}, and other input sizes are scaled to this dimension. Loop unrolling, data reuse, and resource sharing are applied in hardware implementation to save resources. The hardware architecture is fixed for all block sizes and QPs, and only the coefficients of the CNN are changed. In terms of compression quality, the proposed hardware achieves a 4.42%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4.42\%$$\end{document} BD-BR and -0.19\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-\,0.19$$\end{document} BD-PSNR compared to HM16.5. The proposed system can process 64x64\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$64\times 64$$\end{document} CU at 150 MHz and in 4914 clock cycles. The hardware resources utilized by the proposed system include 13,141 LUTs, 15,885 Flip-flops, 51 BRAMs, and 74 DSPs.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Deep CNN Co-design for HEVC CU Partition Prediction on FPGA-SoC
    Bouaafia, Soulef
    Khemiri, Randa
    Messaoud, Seifeddine
    Sayadi, Fatma Ezahra
    NEURAL PROCESSING LETTERS, 2022, 54 (04) : 3283 - 3301
  • [42] Fault Classification and Diagnosis Approach Using FFT-CNN for FPGA-Based CORDIC Processor
    Xie, Yu
    Chen, He
    Zhuang, Yin
    Xie, Yizhuang
    ELECTRONICS, 2024, 13 (01)
  • [43] Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory
    Jiang, Chao
    Ojika, David
    Patel, Bhavesh
    Lam, Herman
    2021 IEEE 29TH ANNUAL INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE CUSTOM COMPUTING MACHINES (FCCM 2021), 2021, : 157 - 164
  • [44] Intra CTU depth decision for HEVC by using Neural Networks
    Li Yanfen
    Wang, Hanxiang
    Dang, L. Minh
    Islam, Khawar
    Kim, Hae Kwang
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2021, 2021, 11766
  • [45] End -to -End FPGA-based Object Detection Using Pipelined CNN and Non -Maximum Suppression
    Anupreetham, Anupreetham
    Ibrahim, Mohamed
    Hall, Mathew
    Boutros, Andrew
    Kuzhively, Ajay
    Mohanty, Abinash
    Nurvitadhi, Eriko
    Betz, Vaughn
    Cao, Yu
    Seo, Jae-sun
    2021 31ST INTERNATIONAL CONFERENCE ON FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS (FPL 2021), 2021, : 76 - 82
  • [46] Fast CU Partition Decision Algorithm for VVC Intra Coding Using an MET-CNN
    Wang, Yanjun
    Dai, Pu
    Zhao, Jinchao
    Zhang, Qiuwen
    ELECTRONICS, 2022, 11 (19)
  • [47] Light field Image Compression Using Depth-based CNN in Intra Prediction
    Zhong, Tingting
    Jin, Xin
    Li, Lingjun
    Dai, Qionghai
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 8563 - 8567
  • [48] CNN-LNN Based Fast CU Partitioning Decision for VVC 3D Video Depth Map Intra Coding
    Wang, Fengqin
    Wang, Zhiying
    Zhang, Qiuwen
    IEEE ACCESS, 2023, 11 : 87420 - 87429
  • [49] Detect and Replace: Efficient Soft Error Protection of FPGA-Based CNN Accelerators
    Gao, Zhen
    Qi, Yanmao
    Shi, Jinchang
    Liu, Qiang
    Ge, Guangjun
    Wang, Yu
    Reviriego, Pedro
    IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2025, 33 (01) : 66 - 74
  • [50] Towards an FPGA-Based HEVC Encoder: A Low-Complexity Rate Distortion Scheme for AMVP
    Abdelsalam, Ahmed M.
    Shalaby, Ahmed
    Sayed, Mohammed S.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2017, 36 (10) : 4207 - 4226