Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism

被引:1
|
作者
Argade, Dakshata [1 ]
Khairnar, Vaishali [1 ]
Vora, Deepali [2 ]
Patil, Shruti [2 ,3 ]
Kotecha, Ketan [3 ]
Alfarhood, Sultan [4 ]
机构
[1] Terna Engn Coll, Navi Mumbai 400706, India
[2] Symbiosis Inst Technol Deemed Univ, Symbiosis Int Technol, Pune Campus, Pune 412115, India
[3] Symbiosis Int Deemed Univ SIU, Symbiosis Inst Technol Pune Campus, Symbiosis Ctr Appl Artificial Intelligence SCAAI, Pune 412115, India
[4] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, POB 51178, Riyadh 51178, Saudi Arabia
关键词
Attention mechanism; Bidirectional encoder representations from transformer; Decoder; Encoder; Multimodalities; Multimodal abstractive summarization;
D O I
10.1016/j.heliyon.2024.e26162
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In recent decades, abstractive text summarization using multimodal input has attracted many researchers due to the capability of gathering information from various sources to create a concise summary. However, the existing methodologies based on multimodal summarization provide only a summary for the short videos and poor results for the lengthy videos. To address the aforementioned issues, this research presented the Multimodal Abstractive Summarization using Bidirectional Encoder Representations from Transformers (MAS-BERT) with an attention mechanism. The purpose of the video summarization is to increase the speed of searching for a large collection of videos so that the users can quickly decide whether the video is relevant or not by reading the summary. Initially, the data is obtained from the publicly available How2 dataset and is encoded using the Bidirectional Gated Recurrent Unit (Bi-GRU) encoder and the Long Short Term Memory (LSTM) encoder. The textual data which is embedded in the embedding layer is encoded using a bidirectional GRU encoder and the features with audio and video data are encoded with LSTM encoder. After this, BERT based attention mechanism is used to combine the modalities and finally, the BI-GRU based decoder is used for summarizing the multimodalities. The results obtained through the experiments that show the proposed MAS-BERT has achieved a better result of 60.2 for Rouge-1 whereas, the existing Decoder-only Multimodal Transformer (DMmT) and the Factorized Multimodal Transformer based Decoder Only Language model (FLORAL) has achieved 49.58 and 56.89 respectively. Our work facilitates users by providing better contextual information and user experience and would help video-sharing platforms for customer retention by allowing users to search for relevant videos by looking at its summary.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Bilingual Question Answering System Using Bidirectional Encoder Representations from Transformers and Best Matching Method
    Navastara, Dini Adni
    Ihdiannaja
    Arifm, Agus Zainal
    PROCEEDINGS OF 2021 13TH INTERNATIONAL CONFERENCE ON INFORMATION & COMMUNICATION TECHNOLOGY AND SYSTEM (ICTS), 2021, : 360 - 364
  • [32] Prediction of Machine-Generated Financial Tweets Using Advanced Bidirectional Encoder Representations from Transformers
    Arshed, Muhammad Asad
    Gherghina, Stefan Cristian
    Dur-E-Zahra
    Manzoor, Mahnoor
    ELECTRONICS, 2024, 13 (11)
  • [33] Using Data Augmentation and Bidirectional Encoder Representations from Transformers for Improving Punjabi Named Entity Recognition
    Khalid, Hamza
    Murtaza, Ghulam
    Abbas, Qaiser
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (06)
  • [34] Abstractive Text Summarization Using Enhanced Attention Model
    Roul, Rajendra Kumar
    Joshi, Pratik Madhav
    Sahoo, Jajati Keshari
    INTELLIGENT HUMAN COMPUTER INTERACTION (IHCI 2019), 2020, 11886 : 63 - 76
  • [35] Self-Attention Guided Copy Mechanism for Abstractive Summarization
    Xu, Song
    Li, Haoran
    Yuan, Peng
    Wu, Youzheng
    He, Xiaodong
    Zhou, Bowen
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1355 - 1362
  • [36] Railroad accident causal analysis with unstructured narratives using bidirectional encoder representations for transformers
    Song, Bing
    Ma, Xiaoping
    Qin, Yong
    Hu, Hao
    Zhang, Zhipeng
    JOURNAL OF TRANSPORTATION SAFETY & SECURITY, 2023, 15 (07) : 717 - 736
  • [37] An abstractive text summarization technique using transformer model with self-attention mechanism
    Sandeep Kumar
    Arun Solanki
    Neural Computing and Applications, 2023, 35 : 18603 - 18622
  • [38] An abstractive text summarization technique using transformer model with self-attention mechanism
    Kumar, Sandeep
    Solanki, Arun
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (25): : 18603 - 18622
  • [39] News image text classification algorithm with bidirectional encoder representations from transformers model
    Shi, Zhan
    Fan, Chongjun
    Li, Ying
    Zhang, Hongliu
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (01)
  • [40] SeMalBERT: Semantic-based malware detection with bidirectional encoder representations from transformers
    Liu, Junming
    Zhao, Yuntao
    Feng, Yongxin
    Hu, Yutao
    Ma, Xiangyu
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2024, 80