TransVCL: Attention-Enhanced Video Copy Localization Network with Flexible Supervision

被引:0
|
作者
He, Sifeng [1 ]
He, Yue [1 ]
Lu, Minlong [1 ]
Jiang, Chen [1 ]
Yang, Xudong [1 ]
Qian, Feng [1 ]
Zhang, Xiaobo [1 ]
Yang, Lei [1 ]
Zhang, Jiandong [2 ]
机构
[1] Ant Grp, Wuhan, Peoples R China
[2] Copyright Protect Ctr China, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video copy localization aims to precisely localize all the copied segments within a pair of untrimmed videos in video retrieval applications. Previous methods typically start from frame-to-frame similarity matrix generated by cosine similarity between frame-level features of the input video pair, and then detect and refine the boundaries of copied segments on similarity matrix under temporal constraints. In this paper, we propose TransVCL: an attention-enhanced video copy localization network, which is optimized directly from initial frame-level features and trained end-to-end with three main components: a customized Transformer for feature enhancement, a correlation and softmax layer for similarity matrix generation, and a temporal alignment module for copied segments localization. In contrast to previous methods demanding the handcrafted similarity matrix, TransVCL incorporates long-range temporal information between feature sequence pair using self- and cross- attention layers. With the joint design and optimization of three components, the similarity matrix can be learned to present more discriminative copied patterns, leading to significant improvements over previous methods on segment-level labeled datasets (VCSL and VCDB). Besides the state-of-the-art performance in fully supervised setting, the attention architecture facilitates TransVCL to further exploit unlabeled or simply video-level labeled data. Additional experiments of supplementing video-level labeled datasets including SVD and FIVR reveal the high flexibility of TransVCL from full supervision to semi-supervision (with or without video-level annotation). Code is publicly available at https://github.com/transvcl/TransVCL.
引用
收藏
页码:799 / 807
页数:9
相关论文
共 50 条
  • [41] Attention-enhanced multi-scale residual network for single image super-resolution
    Sun, Yubin
    Qin, Jiongming
    Gao, Xuliang
    Chai, Shuiqin
    Chen, Bin
    SIGNAL IMAGE AND VIDEO PROCESSING, 2022, 16 (05) : 1417 - 1424
  • [42] Attention-enhanced multi-scale residual network for single image super-resolution
    Yubin Sun
    Jiongming Qin
    Xuliang Gao
    Shuiqin Chai
    Bin Chen
    Signal, Image and Video Processing, 2022, 16 : 1417 - 1424
  • [43] An Attention-Enhanced Feature Fusion Network (AeF2N) for Hyperspectral Image Classification
    Zheng, Yongjie
    Liu, Sicong
    Bruzzone, Lorenzo
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [44] Ranking surgical skills using an attention-enhanced Siamese network with piecewise aggregated kinematic data
    Burçin Buket Oğul
    Matthias Gilgien
    Suat Özdemir
    International Journal of Computer Assisted Radiology and Surgery, 2022, 17 : 1039 - 1048
  • [45] An Attention-Enhanced Multi-Scale and Dual Sign Language Recognition Network Based on a Graph Convolution Network
    Meng, Lu
    Li, Ronghui
    SENSORS, 2021, 21 (04) : 1 - 22
  • [46] Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
    Jiang, Chen
    Liu, Hong
    Yu, Xuzheng
    Wang, Qing
    Cheng, Yuan
    Xu, Jia
    Liu, Zhongyi
    Guo, Qingpei
    Chu, Wei
    Yang, Ming
    Qi, Yuan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4626 - 4636
  • [47] Configure Your Federation: Hierarchical Attention-enhanced Meta-Learning Network for Personalized Federated Learning
    Gao, Yujia
    Wang, Pengfei
    Liu, Liang
    Zhang, Chi
    Ma, Huadong
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (04)
  • [48] AER-Net: Attention-Enhanced Residual Refinement Network for Nuclei Segmentation and Classification in Histology Images
    Cao, Ruifen
    Meng, Qingbin
    Tan, Dayu
    Wei, Pijing
    Ding, Yun
    Zheng, Chunhou
    Sensors, 2024, 24 (22)
  • [49] Widened Attention-Enhanced Atrous Convolutional Network for Efficient Embedded Vision Applications under Resource Constraints
    Ferdaus, Md Meftahul
    Abdelguerfi, Mahdi
    Niles, Kendall N.
    Pathak, Ken
    Tom, Joe
    ADVANCED INTELLIGENT SYSTEMS, 2024,
  • [50] A Dual-Modal Attention-Enhanced Deep Learning Network for Quantification of Parkinson's Disease Characteristics
    Xia, Yi
    Yao, ZhiMing
    Ye, Qiang
    Cheng, Nan
    IEEE TRANSACTIONS ON NEURAL SYSTEMS AND REHABILITATION ENGINEERING, 2020, 28 (01) : 42 - 51