Unsupervised Video Summarization Based on the Diffusion Model of Feature Fusion

被引:0
|
作者
Yu, Qinghao [1 ]
Yu, Hui [1 ,2 ]
Sun, Ying [3 ]
Ding, Derui [1 ]
Jian, Muwei [4 ,5 ]
机构
[1] Univ Shanghai Sci & Technol, Sch Control Engn, Shanghai 200093, Peoples R China
[2] Univ Portsmouth, Sch Creat Technol, Portsmouth PO1 2DJ, England
[3] Univ Shanghai Sci & Technol, Business Sch, Shanghai 200093, Peoples R China
[4] Shandong Univ Finance & Econ, Sch Comp Sci & Technol, Jinan 250014, Peoples R China
[5] Linyi Univ, Sch Informat Sci & Technol, Linyi, Peoples R China
关键词
Feature extraction; Generative adversarial networks; Training; Accidents; Mathematical models; Data mining; Gaussian noise; Coarse-fine frame selector (CFSS); diffusion model; feature fusion; multigrained; unsupervised video summarization; GAN;
D O I
10.1109/TCSS.2024.3384627
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Video summarization (VS) technologies can automatically extract key frames with effective information and thus can help to quickly identify the events or speed up the decision-making process, especially for accidents. With the fast development of deep learning technologies, many generative adversarial network (GAN)- and reinforcement learning (RL)-based unsupervised VS methods have been developed in recent years. However, these methods could suffer from the problems of unstable training and difficulty of reward function formulation, respectively. To this end, we present an unsupervised VS method called diffusion model of feature fusion (DMFF) in this article, which consists of a diffusion module (DM), a feature extraction and compression module (FECM), and a coarse-fine frame selector (CFFS). DM is designed to avoid the training instability problem caused by GAN's alternate training generator and discriminator. FECM is used to extract and compress video features. CFFS is designed to capture both low-level and high-level features between frames to handle complex and diverse accident videos. Then, high-level local and global features are fused to generate a multigrained final frame score. Experiments on two widely used benchmark datasets, SumMe and TVSum, demonstrate the effectiveness and superiority of the proposed network to the state-of-the-art methods, and the training is more stable.
引用
收藏
页码:6010 / 6021
页数:12
相关论文
共 50 条
  • [1] Endoscopy Video Summarization based on Unsupervised Learning and Feature Discrimination
    Ben Ismail, M. Maher
    Bchir, Ouiem
    Emam, Ahmed Z.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP 2013), 2013,
  • [2] Video Summarization Based on Feature Fusion and Data Augmentation
    Psallidas, Theodoros
    Spyrou, Evaggelos
    [J]. COMPUTERS, 2023, 12 (09)
  • [3] Discriminative Feature Learning for Unsupervised Video Summarization
    Jung, Yunjae
    Cho, Donghyeon
    Kim, Dahun
    Woo, Sanghyun
    Kweon, In So
    [J]. THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8537 - 8544
  • [4] Spatiotemporal Feature Fusion for Video Summarization
    Kashid, Shamal
    Awasthi, Lalit K.
    Berwal, Krishan
    Saini, Parul
    [J]. IEEE MULTIMEDIA, 2024, 31 (03) : 88 - 97
  • [5] Feature aggregation based visual attention model for video summarization
    Ejaz, Naveed
    Mehmood, Irfan
    Baik, Sung Wook
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2014, 40 (03) : 993 - 1005
  • [6] Feature fusion and redundancy pruning for rush video summarization
    Vision Research Laboratory, University of California, Santa Barbara, United States
    [J]. Proc ACM Int Multimedia Conf Exhib, 2007, (84-88):
  • [7] Video Summarization Generation Network Based on Dynamic Graph Contrastive Learning and Feature Fusion
    Zhang, Jing
    Wu, Guangli
    Bi, Xinlong
    Cui, Yulong
    [J]. ELECTRONICS, 2024, 13 (11)
  • [8] Unsupervised Video Summarization based on Consistent Clip Generation
    Ai, Xin
    Song, Yan
    Li, Zechao
    [J]. 2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
  • [9] Multi-scale deep feature fusion based sparse dictionary selection for video summarization
    Wu, Xiao
    Ma, Mingyang
    Wan, Shuai
    Han, Xiuxiu
    Mei, Shaohui
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 118
  • [10] Unsupervised Video Summarization Based on Deep Reinforcement Learning with Interpolation
    Yoon, Ui Nyoung
    Hong, Myung Duk
    Jo, Geun-Sik
    [J]. SENSORS, 2023, 23 (07)