Temporal-spatial information mining and aggregation for video matting

被引：0

作者：

Zhiwei Ma

Guilin Yao

机构：

[1] Harbin University of Commerce,

来源：

Multimedia Tools and Applications | 2024年 / 83卷

关键词：

Double decoder; Spatial continuity; Temporal coherence; Video matting;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

In previous video matting methods, there are some problems that require additional auxiliary information and lack of temporal consistency. To solve these problems, we propose a novel video matting framework (STMI-Net) based on temporal-spatial information mining and aggregation. This framework doesn’t require any auxiliary information and adopts a double decoder network structure, specifically, one decoder is composed of the recurrent network, which can make full use of the temporal information in the video frames to ensure the temporal coherence in results; and the other decoder is composed of the convolution network, which deeply restores the frame-by-frame spatial features to achieve the spatial continuity in results. By aggregating these two parts of the information at the global level, our model achieves 0.0066 MSE on the VideoMatte240K dataset, which surpasses the RVM baseline by 13%; and achieves 0.0047 MSE on PPM-100 portrait matting dataset, which surpasses the MG baseline by 26.5%. We also implement an ablation study to demonstrate the specific functions of the temporal decoder and the spatial decoder in our model.

引用

页码：29221 / 29237

页数：16

共 50 条

[1] Temporal-spatial information mining and aggregation for video matting
Ma, Zhiwei
Yao, Guilin
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (10) : 29221 - 29237
[2] Video Object Extraction Integrating Temporal-Spatial Information
Zhu, Shiping
Gao, Jie
PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ELECTRONIC & MECHANICAL ENGINEERING AND INFORMATION TECHNOLOGY (EMEIT-2012), 2012, 23
[3] Patchwise Temporal-Spatial Feature Aggregation Network for Object Detection in Satellite Video
Zheng, Shangdong
Wu, Zebin
Xu, Yang
Liu, Pengfei
Zheng, Peng
Wei, Zhihui
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
[4] Deep Temporal-Spatial Aggregation for Video-Based Facial Expression Recognition
Pan, Xianzhang
Guo, Wenping
Guo, Xiaoying
Li, Wenshu
Xu, Junjie
Wu, Jinzhao
SYMMETRY-BASEL, 2019, 11 (01):
[5] Temporal-spatial unpredictable auditory information modulates temporal-spatial coincident audiovisual integrationa
Li, Qi
Yang, Jingjing
Wu, Jinglong
2013 ICME INTERNATIONAL CONFERENCE ON COMPLEX MEDICAL ENGINEERING (CME), 2013, : 31 - 34
[6] Temporal-spatial optical information processing
Ichioka, Y
Konishi, T
PHOTOREFRACTIVE FIBER AND CRYSTAL DEVICES: MATERIALS, OPTICAL PROPERTIES, AND APPLICATIONS III, 1997, 3137 : 222 - 227
[7] Gaitts: indoor gait recognition with multi-scale temporal-spatial information aggregation
Zhang, Langwen
Men, Zihan
Xie, Wei
SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (01)
[8] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
Ren, Shuhuai
Chen, Sishuo
Li, Shicheng
Sun, Xu
Hou, Lu
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 932 - 947
[9] Deep Video Matting via Spatio-Temporal Alignment and Aggregation
Sun, Yanan
Wang, Guanzhi
Gu, Qiao
Tang, Chi-Keung
Tai, Yu-Wing
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 6971 - 6980
[10] Temporal-spatial optical information processing and transmission
Ichioka, Y
Konishi, T
SELECTED PAPER FROM INTERNATIONAL CONFERENCE ON OPTICS AND OPTOELECTRONICS '98: SILVER JUBILEE SYMPOSIUM OF THE OPTICAL SOCIETY OF INDIA, 1999, 3729 : 149 - 152

← 1 2 3 4 5 →