A Survey on Multi-modal Summarization

被引:22
|
作者
Jangra, Anubhav [1 ]
Mukherjee, Sourajit [2 ]
Jatowt, Adam [3 ,4 ]
Saha, Sriparna [1 ]
Hasanuzzaman, Mohammad [5 ]
机构
[1] Indian Inst Technol Patna, Dept Comp Sci, Patna 801106, Bihar, India
[2] Indian Inst Technol Patna, Dept Math, Patna, Bihar, India
[3] Univ Innsbruck, Dept Informat, Innsbruck, Austria
[4] Univ Innsbruck, DiSC, Innsbruck, Austria
[5] Cork Inst Technol, Dept Comp Sci, Cork, Ireland
关键词
Summarization; multi-modal content processing; neural networks; FUSION; VIDEO; LANGUAGE; SALIENCY; REVIEWS;
D O I
10.1145/3584700
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The new era of technology has brought us to the point where it is convenient for people to share their opinions over an abundance of platforms. These platforms have a provision for the users to express themselves in multiple forms of representations, including text, images, videos, and audio. This, however, makes it difficult for users to obtain all the key information about a topic, making the task of automatic multi-modal summarization (MMS) essential. In this article, we present a comprehensive survey of the existing research in the area of MMS, covering various modalities such as text, image, audio, and video. Apart from highlighting the different evaluation metrics and datasets used for the MMS task, our work also discusses the current challenges and future directions in this field.
引用
收藏
页数:36
相关论文
共 50 条
  • [1] Multi-modal anchor adaptation learning for multi-modal summarization
    Chen, Zhongfeng
    Lu, Zhenyu
    Rong, Huan
    Zhao, Chuanjun
    Xu, Fan
    NEUROCOMPUTING, 2024, 570
  • [2] Multi-modal Video Summarization
    Huang, Jia-Hong
    ICMR 2024 - Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024, : 1214 - 1218
  • [3] Multi-modal Video Summarization
    Huang, Jia-Hong
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1214 - 1218
  • [4] Multi-Modal Code Summarization with Retrieved Summary
    Lin, Lile
    Huang, Zhiqiu
    Yu, Yaoshen
    Liu, Yapeng
    2022 IEEE 22ND INTERNATIONAL WORKING CONFERENCE ON SOURCE CODE ANALYSIS AND MANIPULATION (SCAM 2022), 2022, : 132 - 142
  • [5] Fostering multi-modal summarization for trend information
    Kato, Tsuneaki
    Matsushita, Mitsunori
    Kando, Noriko
    KNOWLEDGE-BASED INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS: KES 2007 - WIRN 2007, PT II, PROCEEDINGS, 2007, 4693 : 377 - 386
  • [6] Multi-modal and multi-scale photo collection summarization
    Xu Shen
    Xinmei Tian
    Multimedia Tools and Applications, 2016, 75 : 2527 - 2541
  • [7] Multi-modal and multi-scale photo collection summarization
    Shen, Xu
    Tian, Xinmei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (05) : 2527 - 2541
  • [8] Hierarchical multi-modal video summarization with dynamic sampling
    Yu, Lingjian
    Zhao, Xing
    Xie, Liang
    Liang, Haoran
    Liang, Ronghua
    IET IMAGE PROCESSING, 2024, 18 (14) : 4577 - 4588
  • [9] Large Scale Multi-Lingual Multi-Modal Summarization Dataset
    Verma, Yash
    Jangra, Anubhav
    Kumar, Raghvendra
    Saha, Sriparna
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3620 - 3632
  • [10] Reliable Multi-modal Learning: A Survey
    Yang Y.
    Zhan D.-C.
    Jiang Y.
    Xiong H.
    Ruan Jian Xue Bao/Journal of Software, 2021, 32 (04): : 1067 - 1081