Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism

被引:1
|
作者
Argade, Dakshata [1 ]
Khairnar, Vaishali [1 ]
Vora, Deepali [2 ]
Patil, Shruti [2 ,3 ]
Kotecha, Ketan [3 ]
Alfarhood, Sultan [4 ]
机构
[1] Terna Engn Coll, Navi Mumbai 400706, India
[2] Symbiosis Inst Technol Deemed Univ, Symbiosis Int Technol, Pune Campus, Pune 412115, India
[3] Symbiosis Int Deemed Univ SIU, Symbiosis Inst Technol Pune Campus, Symbiosis Ctr Appl Artificial Intelligence SCAAI, Pune 412115, India
[4] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, POB 51178, Riyadh 51178, Saudi Arabia
关键词
Attention mechanism; Bidirectional encoder representations from transformer; Decoder; Encoder; Multimodalities; Multimodal abstractive summarization;
D O I
10.1016/j.heliyon.2024.e26162
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In recent decades, abstractive text summarization using multimodal input has attracted many researchers due to the capability of gathering information from various sources to create a concise summary. However, the existing methodologies based on multimodal summarization provide only a summary for the short videos and poor results for the lengthy videos. To address the aforementioned issues, this research presented the Multimodal Abstractive Summarization using Bidirectional Encoder Representations from Transformers (MAS-BERT) with an attention mechanism. The purpose of the video summarization is to increase the speed of searching for a large collection of videos so that the users can quickly decide whether the video is relevant or not by reading the summary. Initially, the data is obtained from the publicly available How2 dataset and is encoded using the Bidirectional Gated Recurrent Unit (Bi-GRU) encoder and the Long Short Term Memory (LSTM) encoder. The textual data which is embedded in the embedding layer is encoded using a bidirectional GRU encoder and the features with audio and video data are encoded with LSTM encoder. After this, BERT based attention mechanism is used to combine the modalities and finally, the BI-GRU based decoder is used for summarizing the multimodalities. The results obtained through the experiments that show the proposed MAS-BERT has achieved a better result of 60.2 for Rouge-1 whereas, the existing Decoder-only Multimodal Transformer (DMmT) and the Factorized Multimodal Transformer based Decoder Only Language model (FLORAL) has achieved 49.58 and 56.89 respectively. Our work facilitates users by providing better contextual information and user experience and would help video-sharing platforms for customer retention by allowing users to search for relevant videos by looking at its summary.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Investigation of Pre-Trained Bidirectional Encoder Representations from Transformers Checkpoints for Indonesian Abstractive Text Summarization
    Lucky, Henry
    Suhartono, Derwin
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2022, 21 (01): : 71 - 94
  • [2] MalBERT: Malware Detection using Bidirectional Encoder Representations from Transformers
    Rahali, Abir
    Akhloufi, Moulay A.
    2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 3226 - 3231
  • [3] Cyberbullying Detection Using Bidirectional Encoder Representations from Transformers (BERT)
    Sujud, Razan
    Fahs, Walid
    Khatoun, Rida
    Chbib, Fadlallah
    2024 IEEE INTERNATIONAL MEDITERRANEAN CONFERENCE ON COMMUNICATIONS AND NETWORKING, MEDITCOM 2024, 2024, : 257 - 262
  • [4] A Literature Review on Bidirectional Encoder Representations from Transformers
    Shreyashree, S.
    Sunagar, Pramod
    Rajarajeswari, S.
    Kanavalli, Anita
    INVENTIVE COMPUTATION AND INFORMATION TECHNOLOGIES, ICICIT 2021, 2022, 336 : 305 - 320
  • [5] Transient chaos in bidirectional encoder representations from transformers
    Inoue, Katsuma
    Ohara, Soh
    Kuniyoshi, Yasuo
    Nakajima, Kohei
    PHYSICAL REVIEW RESEARCH, 2022, 4 (01):
  • [6] Introducing bidirectional attention for autoregressive models in abstractive summarization
    Zhao, Jianfei
    Sun, Xin
    Feng, Chong
    INFORMATION SCIENCES, 2025, 689
  • [7] Study of Low Resource Language Document Extractive Summarization using Lexical chain and Bidirectional Encoder Representations from Transformers (BERT)
    Deshpande, Pranjali
    Jahirabadkar, Sunita
    2021 INTERNATIONAL CONFERENCE ON COMPUTATIONAL PERFORMANCE EVALUATION (COMPE-2021), 2021, : 457 - 461
  • [8] DIBERT: Dependency Injected Bidirectional Encoder Representations from Transformers
    Wahab, Abdul
    Sifa, Rafet
    2021 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2021), 2021,
  • [9] Contrastive Attention Mechanism for Abstractive Sentence Summarization
    Duan, Xiangyu
    Yu, Hongfei
    Yin, Mingming
    Zhang, Min
    Luo, Weihua
    Zhang, Yue
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3044 - 3053
  • [10] Emotion detection on Greek social media using Bidirectional Encoder Representations from Transformers
    Alexandridis, Georgios
    Korovesis, Konstantinos
    Varlamis, Iraklis
    Tsantilas, Panagiotis
    Caridakis, George
    25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 28 - 32