UBiSS: A Unified Framework for Bimodal Semantic Summarization of Videos

被引:0
|
作者
Mei, Yuting [1 ]
Yao, Linli [2 ]
Jin, Qin [1 ]
机构
[1] Renmin Univ China, Beijing, Peoples R China
[2] Peking Univ, Beijing, Peoples R China
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
video summarization; video understanding; multimodal semantics;
D O I
10.1145/3652583.3658038
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the surge in the amount of video data, video summarization techniques, including visual-modal(VM) and textual-modal(TM) summarization, are attracting more and more attention. However, unimodal summarization inevitably loses the rich semantics of the video. In this paper, we focus on a more comprehensive video summarization task named Bimodal Semantic Summarization of Videos(BiSSV). Specifically, we first construct a large-scale dataset, BIDS, in(video, VM-Summary, TM-Summary) triplet format. Unlike traditional processing methods, our construction procedure contains a VM-Summary extraction algorithm aiming to preserve the most salient content within long videos. Based on BIDS, we propose a Unified framework UBiSS for the BiSSV task, which models the saliency information in the video and generates a TM-summary and VM-summary simultaneously. We further optimize our model with a list-wise ranking-based objective to improve its capacity to capture highlights. Lastly, we propose a metric, NDCG(MS), to provide a joint evaluation of the bimodal summary. Experiments show that our unified framework achieves better performance than multi-stage summarization pipelines. Code and data are available at https:// github.com/ MeiYutingg/ UBiSS.
引用
收藏
页码:1034 / 1042
页数:9
相关论文
共 50 条
  • [1] Semantic Text Summarization of Long Videos
    Sah, Shagan
    Kulhare, Sourabh
    Gray, Allison
    Venugopalan, Subhashini
    Prud'hommeaux, Emily
    Ptucha, Raymond
    [J]. 2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, : 989 - 997
  • [2] Semantic units detection and summarization of baseball videos
    Liang, CH
    Kuo, JH
    Chu, WT
    Wu, JL
    [J]. 2004 47TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL I, CONFERENCE PROCEEDINGS, 2004, : 297 - 300
  • [3] Multilevel Framework for Summarization of surveillance videos
    Sujatha, C.
    Chivate, Akshay Ravindra
    Tabib, Ramesh Ashok
    Mudenagudi, Uma
    [J]. 2014 FIFTH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING (ICSIP 2014), 2014, : 265 - 270
  • [4] Rule-based semantic summarization of instructional videos
    Liu, TC
    Kender, JR
    [J]. 2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 601 - 604
  • [5] A unified framework for keywords distillation and summarization
    Wei, Yang
    [J]. PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON MATERIALS ENGINEERING AND INFORMATION TECHNOLOGY APPLICATIONS, 2015, 28 : 671 - 676
  • [6] A Unified Geolocation Framework for Web Videos
    Song, Yicheng
    Zhang, Yongdong
    Cao, Juan
    Tang, Jinhui
    Gao, Xingyu
    Li, Jintao
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2014, 5 (03)
  • [7] A Unified Summarization Model with Semantic Guide and Keyword Coverage Mechanism
    Lin, Wuhang
    Li, Jianling
    Yi, Zibo
    Ji, Bin
    Li, Shasha
    Yu, Jie
    Ma, Jun
    [J]. ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2021, PT V, 2021, 12895 : 333 - 344
  • [8] Towards a unified framework for opinion retrieval, mining and summarization
    Lloret, Elena
    Balahur, Alexandra
    Gomez, Jose M.
    Montoyo, Andres
    Palomar, Manuel
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2012, 39 (03) : 711 - 747
  • [9] UniMS: A Unified Framework for Multimodal Summarization with Knowledge Distillation
    Zhang, Zhengkun
    Meng, Xiaojun
    Wang, Yasheng
    Jiang, Xin
    Liu, Qun
    Yang, Zhenglu
    [J]. THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11757 - 11764
  • [10] Towards a unified framework for opinion retrieval, mining and summarization
    Elena Lloret
    Alexandra Balahur
    José M. Gómez
    Andrés Montoyo
    Manuel Palomar
    [J]. Journal of Intelligent Information Systems, 2012, 39 : 711 - 747