Spatiotemporal Feature Fusion for Video Summarization

被引:0
|
作者
Kashid, Shamal [1 ]
Awasthi, Lalit K. [2 ]
Berwal, Krishan [3 ]
Saini, Parul [4 ]
机构
[1] Natl Inst Technol NIT Uttarakhand, Comp Sci & Engn CSE, Srinagar 246174, India
[2] Natl Inst Technol NIT Uttarakhand, Srinagar, India
[3] Mil Coll Telecommun Engn, Mhow 453441, India
[4] Dehradun Inst Technol Univ, Dehra Dun, India
关键词
Feature extraction; Long short term memory; Training; Benchmark testing; Logic gates; Video compression; Spatiotemporal phenomena; Convolutional neural networks; User experience; Self-organizing networks; Video on demand; Web sites;
D O I
10.1109/MMUL.2024.3428933
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Video summarization (VS) is crucial process for compacting video content into a concise and informative representation, enhancing accessibility and the user experience. This work introduces a new approach based on spatiotemporal features derived from long short-term memory and pretrained convolutional neural network (CNN) models for static VS. It utilizes dual-CNN to identify keyframes by extracting features from benchmark datasets that contain user-generated summaries as the ground truth. Additionally, the incorporation of self-organizing map clustering into the dual-CNN model is investigated for superior performance compared to alternative clustering strategies. This spatiotemporal-based VS method effectively selects the most representative frames from the extracted spatiotemporal features. Unlike traditional methods, it does not require training on specific VS datasets, eliminating the need for extensive labeled data. Compared to existing state-of-the-art techniques in the literature, the proposed approach demonstrates promising results, consistently generating high-quality video summaries across various content categories. It achieved average F-scores of 84.7%, 86.4%, 61.9%, and 53.6% on four benchmark Open Video, YouTube, TVSum, and SumMe datasets, respectively, showing its effectiveness in producing informative video summaries.
引用
收藏
页码:88 / 97
页数:10
相关论文
共 50 条
  • [31] News video summarization based on spatial and motion feature analysis
    Lie, WN
    Lai, CM
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2004, PT 2, PROCEEDINGS, 2004, 3332 : 246 - 255
  • [32] Feature aggregation based visual attention model for video summarization
    Ejaz, Naveed
    Mehmood, Irfan
    Baik, Sung Wook
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (03) : 993 - 1005
  • [33] Spatiotemporal Fusion Networks for Video Action Recognition
    Zheng Liu
    Haifeng Hu
    Junxuan Zhang
    [J]. Neural Processing Letters, 2019, 50 : 1877 - 1890
  • [34] Spatiotemporal Fusion Networks for Video Action Recognition
    Liu, Zheng
    Hu, Haifeng
    Zhang, Junxuan
    [J]. NEURAL PROCESSING LETTERS, 2019, 50 (02) : 1877 - 1890
  • [35] Spatiotemporal Feature Fusion Transformer for Precipitation Nowcasting via Feature Crossing
    Xiong, Taisong
    Wang, Weiping
    He, Jianxin
    Su, Rui
    Wang, Hao
    Hu, Jinrong
    [J]. REMOTE SENSING, 2024, 16 (14)
  • [36] Video similarity matching algorithm based on spatiotemporal feature
    College of Electronic Information and Control Engineering, Beijing University of Technology, Beijing 100022, China
    [J]. Beijing Gongye Daxue Xuebao J. Beijing Univ. Technol., 2008, 12 (1250-1253):
  • [37] Collaborative Spatiotemporal Feature Learning for Video Action Recognition
    Li, Chao
    Zhong, Qiaoyong
    Xie, Di
    Pu, Shiliang
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7864 - 7873
  • [38] VIDEO SALIENCY DETECTION BASED ON SPATIOTEMPORAL FEATURE LEARNING
    Lee, Se-Ho
    Kim, Jin-Hwan
    Choi, Kwang Pyo
    Sim, Jae-Young
    Kim, Chang-Su
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 1120 - 1124
  • [39] EXPLORING THE INFLUENCE OF FEATURE REPRESENTATION FOR DICTIONARY SELECTION BASED VIDEO SUMMARIZATION
    Ma, Mingyang
    Mei, Shaohui
    Ji, Jingyu
    Wan, Shuai
    Wang, Zhiyong
    Feng, Dagan
    [J]. 2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 2911 - 2915
  • [40] A Heterosynapse-Inspired Photodetector for Spatiotemporal Feature Fusion
    Du, Wen
    Li, Caihong
    Hu, Yuxuan
    Yao, Yisen
    Huang, Yixuan
    Zou, Jihua
    Xu, Hao
    Wu, Jiang
    Wang, Zhiming
    [J]. IEEE TRANSACTIONS ON ELECTRON DEVICES, 2022, 69 (08) : 4312 - 4316