Video Joint Modelling Based on Hierarchical Transformer for Co-Summarization

被引：14

作者：

Li, Haopeng ^{[1
]}

Ke, Qiuhong ^{[2
]}

Gong, Mingming ^{[3
]}

Zhang, Rui ^{[4
]}

机构：

[1] Univ Melbourne, Sch Comp & Informat Syst, Parkville, Vic 3010, Australia

[2] Monash Univ, Dept Data Sci & AI, Parkville, Vic 3010, Australia

[3] Univ Melbourne, Sch Math & Stat, Parkville, Vic 3010, Australia

[4] Tsinghua Univ, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 03期

关键词：

Transformers; Semantics; Correlation; Computational modeling; Training; Task analysis; Video on demand; Video summarization; co-summarization; hierarchical transformer; representation reconstruction;

D O I：

10.1109/TPAMI.2022.3186506

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video summarization aims to automatically generate a summary (storyboard or video skim) of a video, which can facilitate large-scale video retrieval and browsing. Most of the existing methods perform video summarization on individual videos, which neglects the correlations among similar videos. Such correlations, however, are also informative for video understanding and video summarization. To address this limitation, we propose Video Joint Modelling based on Hierarchical Transformer (VJMHT) for co-summarization, which takes into consideration the semantic dependencies across videos. Specifically, VJMHT consists of two layers of Transformer: the first layer extracts semantic representation from individual shots of similar videos, while the second layer performs shot-level video joint modelling to aggregate cross-video semantic information. By this means, complete cross-video high-level patterns are explicitly modelled and learned for the summarization of individual videos. Moreover, Transformer-based video representation reconstruction is introduced to maximize the high-level similarity between the summary and the original video. Extensive experiments are conducted to verify the effectiveness of the proposed modules and the superiority of VJMHT in terms of F-measure and rank-based evaluation.

引用

页码：3904 / 3917

页数：14

共 50 条

[31] Hierarchical Recurrent Neural Network for Video Summarization
Zhao, Bin
Li, Xuelong
Lu, Xiaoqiang
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 863 - 871
[32] A Hierarchical Visual Model for Video Object Summarization
Liu, David
Hua, Gang
Chen, Tsuhan
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (12) : 2178 - 2190
[33] Video summarization with temporal-channel visual transformer
Tian, Xiaoyan
Jin, Ye
Zhang, Zhao
Liu, Peng
Tang, Xianglong
PATTERN RECOGNITION, 2025, 165
[34] Unsupervised Video Summarization via Dynamic Modeling-based Hierarchical Clustering
Mahmoud, Karim M.
Ghanem, Nagia M.
Ismail, Mohamed A.
2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 2, 2013, : 303 - 308
[35] OHiFormer: Object-Wise Hierarchical Dependency-Based Transformer for Screen Summarization
Ji Han, Ye
Lee, Soyeon
Kim, Jin Sob
Lee, Byung Hoon
Han, Sung Won
IEEE ACCESS, 2024, 12 : 101313 - 101324
[36] Hierarchical Video Summarization Extraction Algorithm in Compressed Domain
Li Xiang-wei
Zhao Li-dong
Zhao Kai
INTERNATIONAL CONFERENCE ON APPLIED PHYSICS AND INDUSTRIAL ENGINEERING 2012, PT C, 2012, 24 : 2360 - 2366
[37] LEARNING HIERARCHICAL SELF-ATTENTION FOR VIDEO SUMMARIZATION
Liu, Yen-Ting
Li, Yu-Jhe
Yang, Fu-En
Chen, Shang-Fu
Wang, Yu-Chiang Frank
2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 3377 - 3381
[38] Hierarchical Visual Interface for Educational Video Retrieval and Summarization
Weng, Jiahao
Zhang, Chao
Yang, Xi
Xie, Haoran
INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY (IWAIT) 2022, 2022, 12177
[39] Deep hierarchical LSTM networks with attention for video summarization
Lin, Jingxu
Zhong, Sheng-hua
Fares, Ahmed
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 97
[40] Topic-aware video summarization using multimodal transformer
Zhu, Yubo
Zhao, Wentian
Hua, Rui
Wu, Xinxiao
PATTERN RECOGNITION, 2023, 140

← 1 2 3 4 5 →