Spatiotemporal Feature Fusion for Video Summarization

被引:0
|
作者
Kashid, Shamal [1 ]
Awasthi, Lalit K. [2 ]
Berwal, Krishan [3 ]
Saini, Parul [4 ]
机构
[1] Natl Inst Technol NIT Uttarakhand, Comp Sci & Engn CSE, Srinagar 246174, India
[2] Natl Inst Technol NIT Uttarakhand, Srinagar, India
[3] Mil Coll Telecommun Engn, Mhow 453441, India
[4] Dehradun Inst Technol Univ, Dehra Dun, India
关键词
Feature extraction; Long short term memory; Training; Benchmark testing; Logic gates; Video compression; Spatiotemporal phenomena; Convolutional neural networks; User experience; Self-organizing networks; Video on demand; Web sites;
D O I
10.1109/MMUL.2024.3428933
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Video summarization (VS) is crucial process for compacting video content into a concise and informative representation, enhancing accessibility and the user experience. This work introduces a new approach based on spatiotemporal features derived from long short-term memory and pretrained convolutional neural network (CNN) models for static VS. It utilizes dual-CNN to identify keyframes by extracting features from benchmark datasets that contain user-generated summaries as the ground truth. Additionally, the incorporation of self-organizing map clustering into the dual-CNN model is investigated for superior performance compared to alternative clustering strategies. This spatiotemporal-based VS method effectively selects the most representative frames from the extracted spatiotemporal features. Unlike traditional methods, it does not require training on specific VS datasets, eliminating the need for extensive labeled data. Compared to existing state-of-the-art techniques in the literature, the proposed approach demonstrates promising results, consistently generating high-quality video summaries across various content categories. It achieved average F-scores of 84.7%, 86.4%, 61.9%, and 53.6% on four benchmark Open Video, YouTube, TVSum, and SumMe datasets, respectively, showing its effectiveness in producing informative video summaries.
引用
收藏
页码:88 / 97
页数:10
相关论文
共 50 条
  • [1] Video Summarization Based on Feature Fusion and Data Augmentation
    Psallidas, Theodoros
    Spyrou, Evaggelos
    [J]. COMPUTERS, 2023, 12 (09)
  • [2] Feature fusion and redundancy pruning for rush video summarization
    Vision Research Laboratory, University of California, Santa Barbara, United States
    [J]. Proc ACM Int Multimedia Conf Exhib, 2007, (84-88):
  • [3] Unsupervised Video Summarization Based on the Diffusion Model of Feature Fusion
    Yu, Qinghao
    Yu, Hui
    Sun, Ying
    Ding, Derui
    Jian, Muwei
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 6010 - 6021
  • [4] Video Summarization With Spatiotemporal Vision Transformer
    Hsu, Tzu-Chun
    Liao, Yi-Sheng
    Huang, Chun-Rong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 3013 - 3026
  • [5] A spatiotemporal motion model for video summarization
    Vasconcelos, N
    Lippman, A
    [J]. 1998 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1998, : 361 - 366
  • [6] A Video Classification Method Based on Spatiotemporal Detail Attention and Feature Fusion
    Gong, Xuchao
    Li, Zongmin
    [J]. MOBILE INFORMATION SYSTEMS, 2022, 2022
  • [7] Tracking Algorithm Based on Video Person Reidentification and Spatiotemporal Feature Fusion
    Hui Guancheng
    Li Kaifang
    Xin Ming
    Zhang Miaohui
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (12)
  • [8] Video Summarization Using Feature Dissimilarity
    Kim, Hyuncheol
    Yoon, Inhye
    Kim, TaeYong
    Paik, Joonki
    [J]. 2016 INTERNATIONAL CONFERENCE ON ELECTRONICS, INFORMATION, AND COMMUNICATIONS (ICEIC), 2016,
  • [9] Video Summarization Generation Network Based on Dynamic Graph Contrastive Learning and Feature Fusion
    Zhang, Jing
    Wu, Guangli
    Bi, Xinlong
    Cui, Yulong
    [J]. ELECTRONICS, 2024, 13 (11)
  • [10] Multi-Scale Spatiotemporal Feature Fusion Network for Video Saliency Prediction
    Zhang, Yunzuo
    Zhang, Tian
    Wu, Cunyu
    Tao, Ran
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 4183 - 4193