Spatiotemporal Feature Fusion for Video Summarization

被引:0
|
作者
Kashid, Shamal [1 ]
Awasthi, Lalit K. [2 ]
Berwal, Krishan [3 ]
Saini, Parul [4 ]
机构
[1] Natl Inst Technol NIT Uttarakhand, Comp Sci & Engn CSE, Srinagar 246174, India
[2] Natl Inst Technol NIT Uttarakhand, Srinagar, India
[3] Mil Coll Telecommun Engn, Mhow 453441, India
[4] Dehradun Inst Technol Univ, Dehra Dun, India
关键词
Feature extraction; Long short term memory; Training; Benchmark testing; Logic gates; Video compression; Spatiotemporal phenomena; Convolutional neural networks; User experience; Self-organizing networks; Video on demand; Web sites;
D O I
10.1109/MMUL.2024.3428933
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Video summarization (VS) is crucial process for compacting video content into a concise and informative representation, enhancing accessibility and the user experience. This work introduces a new approach based on spatiotemporal features derived from long short-term memory and pretrained convolutional neural network (CNN) models for static VS. It utilizes dual-CNN to identify keyframes by extracting features from benchmark datasets that contain user-generated summaries as the ground truth. Additionally, the incorporation of self-organizing map clustering into the dual-CNN model is investigated for superior performance compared to alternative clustering strategies. This spatiotemporal-based VS method effectively selects the most representative frames from the extracted spatiotemporal features. Unlike traditional methods, it does not require training on specific VS datasets, eliminating the need for extensive labeled data. Compared to existing state-of-the-art techniques in the literature, the proposed approach demonstrates promising results, consistently generating high-quality video summaries across various content categories. It achieved average F-scores of 84.7%, 86.4%, 61.9%, and 53.6% on four benchmark Open Video, YouTube, TVSum, and SumMe datasets, respectively, showing its effectiveness in producing informative video summaries.
引用
收藏
页码:88 / 97
页数:10
相关论文
共 50 条
  • [21] Video summarization via global feature difference optimization
    Zhang Yunzuo
    Liu Yameng
    [J]. OPTOELECTRONICS LETTERS, 2023, 19 (09) : 570 - 576
  • [22] Video summarization via global feature difference optimization
    ZHANG Yunzuo
    LIU Yameng
    [J]. Optoelectronics Letters, 2023, 19 (09) : 570 - 576
  • [23] Interactive System for Video Summarization Based on Multimodal Fusion
    Zheng Li
    Xiaobing Du
    Cuixia Ma
    Yanfeng Li
    Hongan Wang
    [J]. Journal of Beijing Institute of Technology, 2019, 28 (01) : 27 - 34
  • [24] Interactive System for Video Summarization Based on Multimodal Fusion
    Li Z.
    Du X.
    Ma C.
    Li Y.
    Wang H.
    [J]. Journal of Beijing Institute of Technology (English Edition), 2019, 28 (01): : 27 - 34
  • [25] Motion-State-Adaptive Video Summarization via Spatiotemporal Analysis
    Zhang, Yunzuo
    Tao, Ran
    Wang, Yue
    [J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2017, 27 (06) : 1340 - 1352
  • [26] Spatiotemporal Modeling for Video Summarization Using Convolutional Recurrent Neural Network
    Yuan, Yuan
    Li, Haopeng
    Wang, Qi
    [J]. IEEE ACCESS, 2019, 7 : 64676 - 64685
  • [27] Spatiotemporal two-stream LSTM network for unsupervised video summarization
    Min Hu
    Ruimin Hu
    Zhongyuan Wang
    Zixiang Xiong
    Rui Zhong
    [J]. Multimedia Tools and Applications, 2022, 81 : 40489 - 40510
  • [28] Video Summarization Method Based on Spatiotemporal Slice and Dual Attention Mechanism
    Zhang Y.
    Guo Y.
    Li W.
    [J]. Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2022, 56 (12): : 127 - 135
  • [29] Spatiotemporal two-stream LSTM network for unsupervised video summarization
    Hu, Min
    Hu, Ruimin
    Wang, Zhongyuan
    Xiong, Zixiang
    Zhong, Rui
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 40489 - 40510
  • [30] Endoscopy Video Summarization based on Unsupervised Learning and Feature Discrimination
    Ben Ismail, M. Maher
    Bchir, Ouiem
    Emam, Ahmed Z.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP 2013), 2013,