Topic-guided abstractive multimodal summarization with multimodal output

被引:2
|
作者
Rafi, Shaik [1 ]
Das, Ranjita [2 ]
机构
[1] Natl Inst Technol Mizoram, Dept Comp Sci & Engn, Aizawl 796012, Mizoram, India
[2] Natl Inst Technol Agartala, Dept Comp Sci & Engn, Agartala 799046, Tripura, India
关键词
Multimodal Abstractive Summary; Topic Modelling; Latent Dirichlet Allocation; Attention Mechanism; FUSION;
D O I
10.1007/s00521-023-08821-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Summarization is a technique that produces condensed text from large text documents by using different deep-learning techniques. Over the past few years, abstractive summarization has drawn much attention because of the capability of generating human-like sentences with the help of machines. However, it must improve repetition, redundancy and lexical problems while generating sentences. Previous studies show that incorporating images with text modality in the abstractive summary may reduce redundancy, but the concentration still needs to lay on the semantics of the sentences. This paper considers adding a topic to a multimodal summary to address semantics and linguistics problems. This stress the need to develop a multimodal summarization system with the topic. Multimodal summarization uses two or more modalities to extract the essential features to increase user satisfaction in generating an abstractive summary. However, the paper's primary aim is to explore the generation of user preference summaries of a particular topic by proposing a Hybrid Image Text Topic (HITT) to guide the extracted essential information from text and image modalities with the help of topic that addresses semantics and linguistic problems to generate a topic-guided abstractive multimodal summary. Furthermore, a caption-summary order space technique has been introduced in this proposed work to retrieve the relevant image for the generated summary. Finally, the MSMO dataset compares and validates the results with rouge and image precision scores. Besides, we also calculated the model's loss using sparse categorical cross entropy and showed significant improvement over other state-of-the-art techniques.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Topic-Guided Abstractive Multi-Document Summarization
    Cui, Peng
    Hu, Le
    [J]. FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1463 - 1472
  • [2] MSMO: Multimodal Summarization with Multimodal Output
    Zhu, Junnan
    Li, Haoran
    Liu, Tianshang
    Zhou, Yu
    Zhang, Jiajun
    Zong, Chengqing
    [J]. 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 4154 - 4164
  • [3] Abstractive Text Summarization Using Multimodal Information
    Rafi, Shaik
    Das, Ranjita
    [J]. 2023 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE, ISCMI, 2023, : 141 - 145
  • [4] Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
    Yu, Tiezheng
    Dai, Wenliang
    Liu, Zihan
    Fung, Pascale
    [J]. 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3995 - 4007
  • [5] TopicCAT: Unsupervised Topic-Guided Co-Attention Transformer for Extreme Multimodal Summarisation
    Tang, Peggy
    Hu, Kun
    Zhang, Lei
    Gao, Junbin
    Luo, Jiebo
    Wang, Zhiyong
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6643 - 6652
  • [6] Multimodal Abstractive Summarization for How2 Videos
    Palaskar, Shruti
    Libovicky, Jindrich
    Gella, Spandana
    Metze, Florian
    [J]. 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 6587 - 6596
  • [7] TIB: A Dataset for Abstractive Summarization of Long Multimodal Videoconference Records
    Gigant, Theo
    Dufaux, Frederic
    Guinaudeau, Camille
    Decombas, Marc
    [J]. 20TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2023, 2023, : 61 - 70
  • [8] Summary-Oriented Vision Modeling for Multimodal Abstractive Summarization
    Liang, Yunlong
    Meng, Fandong
    Xu, Jinan
    Wang, Jiaan
    Chen, Yufeng
    Zhou, Jie
    [J]. PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 2934 - 2951
  • [9] ICAF: Iterative Contrastive Alignment Framework for Multimodal Abstractive Summarization
    Zhang, Zijian
    Shu, Chang
    Chen, Youxin
    Xiao, Jing
    Zhang, Qian
    Zheng, Lu
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [10] CTNR: Compress-then-Reconstruct Approach for Multimodal Abstractive Summarization
    Zhang, Chenxi
    Zhang, Zijian
    Li, Jiangfeng
    Liu, Qin
    Zhu, Hongming
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,