TransVCL: Attention-Enhanced Video Copy Localization Network with Flexible Supervision

被引:0
|
作者
He, Sifeng [1 ]
He, Yue [1 ]
Lu, Minlong [1 ]
Jiang, Chen [1 ]
Yang, Xudong [1 ]
Qian, Feng [1 ]
Zhang, Xiaobo [1 ]
Yang, Lei [1 ]
Zhang, Jiandong [2 ]
机构
[1] Ant Grp, Wuhan, Peoples R China
[2] Copyright Protect Ctr China, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video copy localization aims to precisely localize all the copied segments within a pair of untrimmed videos in video retrieval applications. Previous methods typically start from frame-to-frame similarity matrix generated by cosine similarity between frame-level features of the input video pair, and then detect and refine the boundaries of copied segments on similarity matrix under temporal constraints. In this paper, we propose TransVCL: an attention-enhanced video copy localization network, which is optimized directly from initial frame-level features and trained end-to-end with three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for similarity matrix generation, and a temporal alignment module for copied segments localization. In contrast to previous methods demanding the handcrafted similarity matrix, TransVCL incorporates long-range temporal information between feature sequence pair using self- and cross- attention layers. With the joint design and optimization of three components, the similarity matrix can be learned to present more discriminative copied patterns, leading to significant improvements over previous methods on segment-level labeled datasets (VCSL and VCDB). Besides the state-of-the-art performance in fully supervised setting, the attention architecture facilitates TransVCL to further exploit unlabeled or simply video-level labeled data. Additional experiments of supplementing video-level labeled datasets including SVD and FIVR reveal the high flexibility of TransVCL from full supervision to semi-supervision (with or without video-level annotation). Code is publicly available at https://github.com/transvcl/TransVCL.
引用
收藏
页码:799 / 807
页数:9
相关论文
共 50 条
  • [1] Attention-enhanced joint learning network for micro-video venue classification
    Bing Wang
    Xianglin Huang
    Gang Cao
    Lifang Yang
    Zhulin Tao
    Xiaolong Wei
    [J]. Multimedia Tools and Applications, 2024, 83 : 12425 - 12443
  • [2] Attention-enhanced joint learning network for micro-video venue classification
    Wang, Bing
    Huang, Xianglin
    Cao, Gang
    Yang, Lifang
    Tao, Zhulin
    Wei, Xiaolong
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 12425 - 12443
  • [3] APLNet: Attention-enhanced progressive learning network
    Zhang, Hui
    Kang, Danqing
    He, Haibo
    Wang, Fei-Yue
    [J]. NEUROCOMPUTING, 2020, 371 : 166 - 176
  • [4] A robust attention-enhanced network with transformer for visual tracking
    Fengwei Gu
    Jun Lu
    Chengtao Cai
    [J]. Multimedia Tools and Applications, 2023, 82 : 40761 - 40782
  • [5] Attention-enhanced neural network models for turbulence simulation
    Peng, Wenhui
    Yuan, Zelong
    Wang, Jianchun
    [J]. PHYSICS OF FLUIDS, 2022, 34 (02)
  • [6] A robust attention-enhanced network with transformer for visual tracking
    Gu, Fengwei
    Lu, Jun
    Cai, Chengtao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (26) : 40761 - 40782
  • [7] Attention-enhanced Graph Convolutional Network for Assessing Rehabilitation Exercises
    Priyadarshani, Smita
    Watanabe, Hiroshi
    [J]. INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY, IWAIT 2024, 2024, 13164
  • [8] Vehicle Anomaly Detection by Attention-Enhanced Temporal Convolutional Network
    He, Zhitao
    Chen, Yongyi
    Zhang, Dan
    Abdulaal, Mohammed
    [J]. 2023 IEEE 6TH INTERNATIONAL CONFERENCE ON INDUSTRIAL CYBER-PHYSICAL SYSTEMS, ICPS, 2023,
  • [9] A novel attention-enhanced network for image super-resolution
    Bo, Yangyu
    Wu, Yongliang
    Wang, Xuejun
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 130
  • [10] Attention-enhanced multiscale feature fusion network for pancreas and tumor segmentation
    Dong, Kaiqi
    Hu, Peijun
    Zhu, Yan
    Tian, Yu
    Li, Xiang
    Zhou, Tianshu
    Bai, Xueli
    Liang, Tingbo
    Li, Jingsong
    [J]. MEDICAL PHYSICS, 2024,