Trajectory-Pooled Spatial-Temporal Architecture of Deep Convolutional Neural Networks for Video Event Detection

被引:4
|
作者
Li, Yonggang [1 ,2 ]
Ge, Rui [1 ]
Ji, Yi [1 ]
Gong, Shengrong [1 ,3 ]
Liu, Chunping [1 ,4 ,5 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
[2] Jiaxing Univ, Coll Math Phys & Informat Engn, Jiaxing 314001, Peoples R China
[3] Changshu Inst Sci & Technol, Sch Comp Sci & Engn, Changshu 215500, Jiangsu, Peoples R China
[4] Jilin Univ, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Jilin, Peoples R China
[5] Collaborat Innovat Ctr Novel Software Technol & I, Nanjing 210046, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Trajectory-pooled; triple-channel; convolutional neural networks; spatial-temporal; event detection; deep feature; RECOGNITION; CLASSIFICATION; DENSE;
D O I
10.1109/TCSVT.2017.2759299
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Nowadays content-based video event detection faces great challenges due to complex scenes and blurred actions in surveillance videos. To alleviate these challenges, we propose a novel spatial-temporal architecture of deep convolutional neural networks for this task. By taking advantage of spatial-temporal information, we fine-tune two-stream networks, and then, fuse spatial and temporal features at convolution layers using a 2D pooling fusion method to enforce the consistence of spatial-temporal information. Based on the two-stream networks and spatial-temporal layer, a triple-channel model is obtained. Furthermore, we implement trajectory-constrained pooling to deep features and hand-crafted features to combine their merits. A fusion method on triple-channel yields the final detection result. The experiments on two benchmark surveillance video data sets including VIRAT 1.0 and VIRAT 2.0, which involve a suit of challenging events, such as person loading an object to a vehicle or person opening a vehicle trunk, manifest that the proposed method can achieve superior performance compared with the state-of-the-art methods on these event benchmarks.
引用
收藏
页码:2683 / 2692
页数:10
相关论文
共 50 条
  • [1] Trajectory-Pooled Deep Convolutional Networks for Violence Detection in Videos
    Meng, Zihan
    Yuan, Jiabin
    Li, Zhen
    [J]. COMPUTER VISION SYSTEMS, ICVS 2017, 2017, 10528 : 437 - 447
  • [2] MULTIMEDIA EVENT DETECTION VIA DEEP SPATIAL-TEMPORAL NEURAL NETWORKS
    Hou, Jingyi
    Wu, Xinxiao
    Yu, Feiwu
    Jia, Yunde
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [3] Shot Boundary Detection with Spatial-Temporal Convolutional Neural Networks
    Wu, Lifang
    Zhang, Shuai
    Jian, Meng
    Zhao, Zhijia
    Wang, Dong
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PT II, 2018, 11257 : 479 - 491
  • [4] Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors
    Wang, Limin
    Qiao, Yu
    Tang, Xiaoou
    [J]. 2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2015, : 4305 - 4314
  • [5] Multiple Trajectory Prediction with Deep Temporal and Spatial Convolutional Neural Networks
    Strohbeck, Jan
    Belagiannis, Vasileios
    Mueller, Johannes
    Schreiber, Marcel
    Herrmann, Martin
    Wolf, Daniel
    Buchholz, Michael
    [J]. 2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 1992 - 1998
  • [6] Spatial-temporal convolutional neural networks for anomaly detection and localization in crowded scenes
    Zhou, Shifu
    Shen, Wei
    Zeng, Dan
    Fang, Mei
    Wei, Yuanwang
    Zhang, Zhijiang
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2016, 47 : 358 - 368
  • [7] Deep spatial-temporal networks for flame detection
    Shahid, Mohammad
    Chien, I-Feng
    Sarapugdi, Wannaporn
    Miao, Lili
    Hua, Kai-Lung
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35297 - 35318
  • [8] Deep spatial-temporal networks for flame detection
    Mohammad Shahid
    I-Feng Chien
    Wannaporn Sarapugdi
    Lili Miao
    Kai-Lung Hua
    [J]. Multimedia Tools and Applications, 2021, 80 : 35297 - 35318
  • [9] Parallel spatial-temporal convolutional neural networks for anomaly detection and location in crowded scenes
    Hu, Zheng-ping
    Zhang, Le
    Li, Shu-fang
    Sun, De-gang
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2020, 67
  • [10] Deep Spatial-Temporal 3D Convolutional Neural Networks for Traffic Data Forecasting
    Guo, Shengnan
    Lin, Youfang
    Li, Shijie
    Chen, Zhaoming
    Wan, Huaiyu
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2019, 20 (10) : 3913 - 3926