Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism

被引:1
|
作者
Argade, Dakshata [1 ]
Khairnar, Vaishali [1 ]
Vora, Deepali [2 ]
Patil, Shruti [2 ,3 ]
Kotecha, Ketan [3 ]
Alfarhood, Sultan [4 ]
机构
[1] Terna Engn Coll, Navi Mumbai 400706, India
[2] Symbiosis Inst Technol Deemed Univ, Symbiosis Int Technol, Pune Campus, Pune 412115, India
[3] Symbiosis Int Deemed Univ SIU, Symbiosis Inst Technol Pune Campus, Symbiosis Ctr Appl Artificial Intelligence SCAAI, Pune 412115, India
[4] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, POB 51178, Riyadh 51178, Saudi Arabia
关键词
Attention mechanism; Bidirectional encoder representations from transformer; Decoder; Encoder; Multimodalities; Multimodal abstractive summarization;
D O I
10.1016/j.heliyon.2024.e26162
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In recent decades, abstractive text summarization using multimodal input has attracted many researchers due to the capability of gathering information from various sources to create a concise summary. However, the existing methodologies based on multimodal summarization provide only a summary for the short videos and poor results for the lengthy videos. To address the aforementioned issues, this research presented the Multimodal Abstractive Summarization using Bidirectional Encoder Representations from Transformers (MAS-BERT) with an attention mechanism. The purpose of the video summarization is to increase the speed of searching for a large collection of videos so that the users can quickly decide whether the video is relevant or not by reading the summary. Initially, the data is obtained from the publicly available How2 dataset and is encoded using the Bidirectional Gated Recurrent Unit (Bi-GRU) encoder and the Long Short Term Memory (LSTM) encoder. The textual data which is embedded in the embedding layer is encoded using a bidirectional GRU encoder and the features with audio and video data are encoded with LSTM encoder. After this, BERT based attention mechanism is used to combine the modalities and finally, the BI-GRU based decoder is used for summarizing the multimodalities. The results obtained through the experiments that show the proposed MAS-BERT has achieved a better result of 60.2 for Rouge-1 whereas, the existing Decoder-only Multimodal Transformer (DMmT) and the Factorized Multimodal Transformer based Decoder Only Language model (FLORAL) has achieved 49.58 and 56.89 respectively. Our work facilitates users by providing better contextual information and user experience and would help video-sharing platforms for customer retention by allowing users to search for relevant videos by looking at its summary.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMERS FOR CYBERBULLYING TEXT DETECTION IN INDONESIAN SOCIAL MEDIA
    Candra, Aswin
    Wella
    Wicaksana, Arya
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2021, 17 (05): : 1599 - 1615
  • [42] On the Dependable Operation of Bidirectional Encoder Representations from Transformers (BERT) in the Presence of Soft Errors
    Gao, Zhen
    Wang, Jingyan
    Su, Rui
    Reviriego, Pedro
    Liu, Shanshan
    Fabrizio, Lombardi
    2023 IEEE 23RD INTERNATIONAL CONFERENCE ON NANOTECHNOLOGY, NANO, 2023, : 582 - 586
  • [43] Using Bidirectional Encoder Representations from Transformers (BERT) to predict criminal charges and sentences from Taiwanese court judgments
    Peng, Yi-Ting
    Lei, Chin-Laung
    PEERJ COMPUTER SCIENCE, 2024, 10
  • [44] Automated detection of contractual risk clauses from construction specifications using bidirectional encoder representations from transformers (BERT)
    Moon, Seonghyeon
    Chi, Seokho
    Im, Seok-Been
    AUTOMATION IN CONSTRUCTION, 2022, 142
  • [45] Software solution for text summarisation using machine learning based Bidirectional Encoder Representations from Transformers algorithm
    Al Abdulwahid, Abdulwahid
    IET SOFTWARE, 2023, 17 (04) : 755 - 764
  • [46] Action Recognition in Dark Videos Using Spatio-Temporal Features and Bidirectional Encoder Representations from Transformers
    Singh H.
    Suman S.
    Subudhi B.N.
    Jakhetiya V.
    Ghosh A.
    IEEE Transactions on Artificial Intelligence, 2023, 4 (06): : 1461 - 1471
  • [47] Automated detection of contractual risk clauses from construction specifications using bidirectional encoder representations from transformers (BERT)
    Moon, Seonghyeon
    Chi, Seokho
    Im, Seok-Been
    Automation in Construction, 2022, 142
  • [48] Personality Prediction Based on Text Analytics Using Bidirectional Encoder Representations from Transformers from English Twitter Dataset
    Arijanto, Joshua Evan
    Geraldy, Steven
    Tania, Cyrena
    Suhartono, Derwin
    INTERNATIONAL JOURNAL OF FUZZY LOGIC AND INTELLIGENT SYSTEMS, 2021, 21 (03) : 310 - 316
  • [49] Aspect-based Sentiment Analysis for Bengali Text using Bidirectional Encoder Representations from Transformers (BERT)
    Samia, Moythry Manir
    Rajee, Alimul
    Hasan, Md Rakib
    Faruq, Mohammad Omar
    Paul, Pintu Chandra
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 978 - 986
  • [50] NLP-Based Automatic Summarization using Bidirectional Encoder Representations from Transformers-Long Short Term Memory Hybrid Model: Enhancing Text Compression
    Kartha, Ranju S.
    Agal, Sanjay
    Odedra, Niyati Dhirubhai
    Nanda, Ch Sudipta Kishore
    Rao, Vuda Sreenivasa
    Kuthe, Annaji M.
    Taloba, Ahmed I.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2024, 15 (05) : 1223 - 1236