FVIFormer: Flow-Guided Global-Local Aggregation Transformer Network for Video Inpainting

被引:2
|
作者
Yan, Weiqing [1 ]
Sun, Yiqiu [1 ]
Yue, Guanghui [2 ]
Zhou, Wei [3 ]
Liu, Hantao [3 ]
机构
[1] Yantai Univ, Sch Comp & Control Engn, Yantai 261400, Peoples R China
[2] Shenzhen Univ, Med Sch, Sch Biomed Engn, Shenzhen 518060, Peoples R China
[3] Cardiff Univ, Sch Comp Sci & Informat, Cardiff CF24 4AG, Wales
基金
中国国家自然科学基金;
关键词
Machine learning--deep learning; OBJECT REMOVAL; IMAGE;
D O I
10.1109/JETCAS.2024.3392972
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Video inpainting has been extensively used in recent years. Established works usually utilise the similarity between the missing region and its surrounding features to inpaint in the visually damaged content in a multi-stage manner. However, due to the complexity of the video content, it may result in the destruction of structural information of objects within the video. In addition to this, the presence of moving objects in the damaged regions of the video can further increase the difficulty of this work. To address these issues, we propose a flow-guided global-Local aggregation Transformer network for video inpainting. First, we use a pre-trained optical flow complementation network to repair the defective optical flow of video frames. Then, we propose a content inpainting module, which use the complete optical flow as a guide, and propagate the global content across the video frames using efficient temporal and spacial Transformer to inpaint in the corrupted regions of the video. Finally, we propose a structural rectification module to enhance the coherence of content around the missing regions via combining the extracted local and global features. In addition, considering the efficiency of the overall framework, we also optimized the self-attention mechanism to improve the speed of training and testing via depth-wise separable encoding. We validate the effectiveness of our method on the YouTube-VOS and DAVIS video datasets. Extensive experiment results demonstrate the effectiveness of our approach in edge-complementing video content that has undergone stabilisation algorithms.
引用
收藏
页码:235 / 244
页数:10
相关论文
共 50 条
  • [21] SeamsTalk: Seamless Talking Face Generation via Flow-Guided Inpainting
    Jeong, Yeongho
    Kim, Gyeongman
    Jang, Doohyuk
    Hwang, Jaeryong
    Yang, Eunho
    IEEE ACCESS, 2024, 12 : 46678 - 46689
  • [22] Global-Local Transformer for Brain Age Estimation
    He, Sheng
    Grant, P. Ellen
    Ou, Yangming
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (01) : 213 - 224
  • [23] Temporal Multimodal Graph Transformer With Global-Local Alignment for Video-Text Retrieval
    Feng, Zerun
    Zeng, Zhimin
    Guo, Caili
    Li, Zheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1438 - 1453
  • [24] Global-Local Temporal Convolutional Network for Traffic Flow Prediction
    Ren, Yajie
    Zhao, Dong
    Luo, Dan
    Ma, Huadong
    Duan, Pengrui
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (02) : 1578 - 1584
  • [25] Flow-Guided Temporal-Spatial Network for HEVC Compressed Video Quality Enhancement
    Meng, Xiandong
    Deng, Xuan
    Zhu, Shuyuan
    Liu, Shuaicheng
    Zeng, Bing
    2020 DATA COMPRESSION CONFERENCE (DCC 2020), 2020, : 384 - 384
  • [26] FLOW-GUIDED DEFORMABLE ATTENTION NETWORK FOR FAST ONLINE VIDEO SUPER-RESOLUTION
    Yang, Xi
    Zhang, Xindong
    Zhang, Lei
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 390 - 394
  • [27] An effective cross-scenario remote heart rate estimation network based on global-local information and video transformer
    Xiang, Guoliang
    Yao, Song
    Peng, Yong
    Deng, Hanwen
    Wu, Xianhui
    Wang, Kui
    Li, Yingli
    Wu, Fan
    PHYSICAL AND ENGINEERING SCIENCES IN MEDICINE, 2024, 47 (02) : 729 - 739
  • [28] RSSGLT: Remote Sensing Image Segmentation Network Based on Global-Local Transformer
    Kumar, Satyawant
    Kumar, Abhishek
    Lee, Dong-Gyu
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [29] An Adaptive Post-Processing Network With the Global-Local Aggregation for Semantic Segmentation
    Zhu, Guilin
    Wang, Runmin
    Liu, Yingying
    Zhu, Zhenlin
    Gao, Changxin
    Liu, Li
    Sang, Nong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (02) : 1159 - 1173
  • [30] Structure Guided Global and Local Attention Transformer for Image Inpainting of Obscured Ships in Maritime Surveillance
    Baek, Woonyoung
    Kang, Sanggil
    Yang, Young-Hoon
    IEEE ACCESS, 2024, 12 : 101999 - 102015