Temporal-spatial information mining and aggregation for video matting

被引:0
|
作者
Zhiwei Ma
Guilin Yao
机构
[1] Harbin University of Commerce,
来源
关键词
Double decoder; Spatial continuity; Temporal coherence; Video matting;
D O I
暂无
中图分类号
学科分类号
摘要
In previous video matting methods, there are some problems that require additional auxiliary information and lack of temporal consistency. To solve these problems, we propose a novel video matting framework (STMI-Net) based on temporal-spatial information mining and aggregation. This framework doesn’t require any auxiliary information and adopts a double decoder network structure, specifically, one decoder is composed of the recurrent network, which can make full use of the temporal information in the video frames to ensure the temporal coherence in results; and the other decoder is composed of the convolution network, which deeply restores the frame-by-frame spatial features to achieve the spatial continuity in results. By aggregating these two parts of the information at the global level, our model achieves 0.0066 MSE on the VideoMatte240K dataset, which surpasses the RVM baseline by 13%; and achieves 0.0047 MSE on PPM-100 portrait matting dataset, which surpasses the MG baseline by 26.5%. We also implement an ablation study to demonstrate the specific functions of the temporal decoder and the spatial decoder in our model.
引用
收藏
页码:29221 / 29237
页数:16
相关论文
共 50 条
  • [1] Temporal-spatial information mining and aggregation for video matting
    Ma, Zhiwei
    Yao, Guilin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (10) : 29221 - 29237
  • [2] Video Object Extraction Integrating Temporal-Spatial Information
    Zhu, Shiping
    Gao, Jie
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ELECTRONIC & MECHANICAL ENGINEERING AND INFORMATION TECHNOLOGY (EMEIT-2012), 2012, 23
  • [3] Patchwise Temporal-Spatial Feature Aggregation Network for Object Detection in Satellite Video
    Zheng, Shangdong
    Wu, Zebin
    Xu, Yang
    Liu, Pengfei
    Zheng, Peng
    Wei, Zhihui
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [4] Deep Temporal-Spatial Aggregation for Video-Based Facial Expression Recognition
    Pan, Xianzhang
    Guo, Wenping
    Guo, Xiaoying
    Li, Wenshu
    Xu, Junjie
    Wu, Jinzhao
    SYMMETRY-BASEL, 2019, 11 (01):
  • [5] Temporal-spatial unpredictable auditory information modulates temporal-spatial coincident audiovisual integrationa
    Li, Qi
    Yang, Jingjing
    Wu, Jinglong
    2013 ICME INTERNATIONAL CONFERENCE ON COMPLEX MEDICAL ENGINEERING (CME), 2013, : 31 - 34
  • [6] Temporal-spatial optical information processing
    Ichioka, Y
    Konishi, T
    PHOTOREFRACTIVE FIBER AND CRYSTAL DEVICES: MATERIALS, OPTICAL PROPERTIES, AND APPLICATIONS III, 1997, 3137 : 222 - 227
  • [7] Gaitts: indoor gait recognition with multi-scale temporal-spatial information aggregation
    Zhang, Langwen
    Men, Zihan
    Xie, Wei
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
  • [8] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
    Ren, Shuhuai
    Chen, Sishuo
    Li, Shicheng
    Sun, Xu
    Hou, Lu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 932 - 947
  • [9] Deep Video Matting via Spatio-Temporal Alignment and Aggregation
    Sun, Yanan
    Wang, Guanzhi
    Gu, Qiao
    Tang, Chi-Keung
    Tai, Yu-Wing
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6971 - 6980
  • [10] Temporal-spatial optical information processing and transmission
    Ichioka, Y
    Konishi, T
    SELECTED PAPER FROM INTERNATIONAL CONFERENCE ON OPTICS AND OPTOELECTRONICS '98: SILVER JUBILEE SYMPOSIUM OF THE OPTICAL SOCIETY OF INDIA, 1999, 3729 : 149 - 152