Learning spatial-temporal features for video copy detection by the combination of CNN and RNN

被引:28
|
作者
Hu, Yaocong [1 ,2 ,3 ]
Lu, Xiaobo [1 ,2 ,3 ]
机构
[1] Southeast Univ, Coll Automat, Nanjing 210096, Jiangsu, Peoples R China
[2] Southeast Univ, Sch Automat, Nanjing 210096, Jiangsu, Peoples R China
[3] Southeast Univ, Minist Educ, Key Lab Measurement & Control Complex Syst Engn, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Video copyright; CNN; Sequence matching; SiamesLSTM; CLASSIFICATION; WATERMARKING;
D O I
10.1016/j.jvcir.2018.05.013
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Following the rapid developments of network multimedia, video copyright protection online has become a hot topic in recent researches. However, video copy detection is still a challenging task in the domain of video analysis and computer vision, due to the large variations in scale and illumination of the copied contents. In this paper, we propose a novel deep learning based approach, in which we jointly use the Convolution Neural Network (CNN) and Recurrent Neural Network (RNN) to solve the specific problem of detecting copied segments in videos. We first utilize a Residual Convolutional Neural Network(ResNet) to extract content features of frame-levels, and then employ a SiameseLSTM architecture for spatial-temporal fusion and sequence matching. Finally, the copied segments are detected by a graph based temporal network. We evaluate the performance of the proposed CNN-RNN based approach on a public large scale video copy dataset called VCDB, and the experiment results demonstrate the effectiveness and high robustness of our method which achieves the significant performance improvements compared to the state of the art.
引用
收藏
页码:21 / 29
页数:9
相关论文
共 50 条
  • [21] Spatial-temporal dual-actor CNN for human interaction prediction in video
    Mahlagha Afrasiabi
    Hassan Khotanlou
    Theo Gevers
    Multimedia Tools and Applications, 2020, 79 : 20019 - 20038
  • [22] Cycle representation-disentangling network: learning to completely disentangle spatial-temporal features in video
    Sun, Pengfei
    Su, Xin
    Guo, Shangqi
    Chen, Feng
    APPLIED INTELLIGENCE, 2020, 50 (12) : 4261 - 4280
  • [23] Real-time video fire smoke detection by utilizing spatial-temporal ConvNet features
    Yaocong Hu
    Xiaobo Lu
    Multimedia Tools and Applications, 2018, 77 : 29283 - 29301
  • [24] Video saliency detection using dynamic fusion of spatial-temporal features in complex background with disturbance
    Wu, Xiaofeng (xiaofengwu@fudan.edu.cn), 2016, Institute of Computing Technology (28):
  • [25] Real-time video fire smoke detection by utilizing spatial-temporal ConvNet features
    Hu, Yaocong
    Lu, Xiaobo
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (22) : 29283 - 29301
  • [26] Combination of Pyramid CNN Representation and Spatial-Temporal Representation for Facial Expression Recognition
    Xu, Shulin
    Pu, Nan
    Qian, Li
    Xiao, Guoqiang
    COMPUTER VISION, PT II, 2017, 772 : 40 - 50
  • [27] Learning a spatial-temporal texture transformer network for video inpainting
    Ma, Pengsen
    Xue, Tao
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [28] Slow Video Detection Based on Spatial-Temporal Feature Representation
    Ma, Jianyu
    Yao, Haichao
    Ni, Rongrong
    Zhao, Yao
    PATTERN RECOGNITION AND COMPUTER VISION,, PT III, 2021, 13021 : 298 - 309
  • [29] ISTVT: Interpretable Spatial-Temporal Video Transformer for Deepfake Detection
    Zhao, Cairong
    Wang, Chutian
    Hu, Guosheng
    Chen, Haonan
    Liu, Chun
    Tang, Jinhui
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 1335 - 1348
  • [30] Spatial-temporal graph attention network for video anomaly detection
    Chen, Haoyang
    Mei, Xue
    Ma, Zhiyuan
    Wu, Xinhong
    Wei, Yachuan
    IMAGE AND VISION COMPUTING, 2023, 131