End-to-End Learning of Video Compression Using Spatio-Temporal Autoencoders

被引:9
|
作者
Pessoa, Jorge [1 ]
Aidos, Helena [2 ]
Tomas, Pedro [1 ]
Figueiredo, Mario A. T. [3 ]
机构
[1] Univ Lisbon, Inst Super Tecn, INESC ID, Lisbon, Portugal
[2] Univ Lisbon, Fac Ciencias, LASIGE, Lisbon, Portugal
[3] Univ Lisbon, Inst Super Tecn, Inst Telecomunicacoes, Lisbon, Portugal
关键词
Autoencoder; End-to-End Learning; Video Compression; Motion Prediction Free;
D O I
10.1109/sips50750.2020.9195249
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep learning (DL) is revolutionizing image and video processing and now holds state-of-the-art performance in many tasks. However, video compression has so far resisted the DL revolution. Current attempts rely on complex solutions, interconnecting multiple networks to mimic the different layers of conventional codecs. Since DL approaches usually excel when the models are allowed to learn their own feature set, a different solution is herein proposed: end-to-end learning of a single network, explicitly avoiding motion estimation/prediction. We formalize it as the rate-distortion optimization of a single spatio-temporal autoencoder, by jointly learning a latent-space projection transform, and a synthesis transform for low-bit-rate video compression. The quantizer uses a rounding scheme, relaxed during training, and an entropy estimation technique to enforce an information bottleneck. The obtained video compression network shows competitive performance against standard codecs (MPEG-4 Part 2, H.264/AVC, H.265/HEVC), particularly for low bitrates, even while avoiding the use of any motion prediction/compensation method.
引用
收藏
页码:276 / 281
页数:6
相关论文
共 50 条
  • [1] Towards Spatio-temporal Collaborative Learning: An End-to-End Deepfake Video Detection Framework
    Guo, Wenxuan
    Du, Shuo
    Deng, Huiyuan
    Yu, Zikang
    Feng, Lin
    [J]. 2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [2] End-to-end Multi-task Learning Framework for Spatio-Temporal Grounding in Video Corpus
    Gao, Yingqi
    Luo, Zhiling
    Chen, Shiqian
    Zhou, Wei
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022, 2022, : 3958 - 3962
  • [3] An End-to-End Learning Framework for Video Compression
    Lu, Guo
    Zhang, Xiaoyun
    Ouyang, Wanli
    Chen, Li
    Gao, Zhiyong
    Xu, Dong
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (10) : 3292 - 3308
  • [4] CoSTA: End-to-End Comprehensive Space-Time Entanglement for Spatio-Temporal Video Grounding
    Liang, Yaoyuan
    Liang, Xiao
    Tang, Yansong
    Yang, Zhao
    Li, Ziran
    Wang, Jingang
    Ding, Wenbo
    Huang, Shao-Lun
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 4, 2024, : 3324 - 3332
  • [5] LEARNING-BASED END-TO-END VIDEO COMPRESSION WITH SPATIAL-TEMPORAL ADAPTATION
    Zhang, Zhaobin
    Li, Yue
    Zhang, Kai
    Zhang, Li
    He, Yuwen
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2821 - 2825
  • [6] HERO: HiErarchical spatio-tempoRal reasOning with Contrastive Action Correspondence for End-to-End Video Object Grounding
    Li, Mengze
    Wang, Tianbao
    Zhang, Haoyu
    Zhang, Shengyu
    Zhao, Zhou
    Zhang, Wenqiao
    Miao, Jiaxu
    Pu, Shiliang
    Wu, Fei
    [J]. PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 3801 - 3810
  • [7] DeepSTEP - Deep Learning-Based Spatio-Temporal End-To-End Perception for Autonomous Vehicles
    Huch, Sebastian
    Sauerbeck, Florian
    Betz, Johannes
    [J]. 2023 IEEE INTELLIGENT VEHICLES SYMPOSIUM, IV, 2023,
  • [8] SFormer: An end-to-end spatio-temporal transformer architecture for deepfake detection
    Kingra, Staffy
    Aggarwal, Naveen
    Kaur, Nirmal
    [J]. FORENSIC SCIENCE INTERNATIONAL-DIGITAL INVESTIGATION, 2024, 51
  • [9] Learning-based End-to-End Video Compression Using Predictive Coding
    de Oliveira, Matheus C.
    Martins, Luiz G. R.
    Jung, Henrique Costa
    Guerin Jr, Nilson Donizete
    da Silva, Renam Castro
    Peixoto, Eduardo
    Macchiavello, Bruno
    Hung, Edson M.
    Testoni, Vanessa
    Freitas, Pedro Garcia
    [J]. 2021 34TH SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI 2021), 2021, : 160 - 167
  • [10] End-to-End Image Classification and Compression With Variational Autoencoders
    Chamain, Lahiru D.
    Qi, Siyu
    Ding, Zhi
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (21): : 21916 - 21931