Multimodal Abstractive Summarization using bidirectional encoder representations from transformers with attention mechanism

被引：1

作者：

Argade, Dakshata ^{[1
]}

Khairnar, Vaishali ^{[1
]}

Vora, Deepali ^{[2
]}

Patil, Shruti ^{[2
,3
]}

Kotecha, Ketan ^{[3
]}

Alfarhood, Sultan ^{[4
]}

机构：

[1] Terna Engn Coll, Navi Mumbai 400706, India

[2] Symbiosis Inst Technol Deemed Univ, Symbiosis Int Technol, Pune Campus, Pune 412115, India

[3] Symbiosis Int Deemed Univ SIU, Symbiosis Inst Technol Pune Campus, Symbiosis Ctr Appl Artificial Intelligence SCAAI, Pune 412115, India

[4] King Saud Univ, Coll Comp & Informat Sci, Dept Comp Sci, POB 51178, Riyadh 51178, Saudi Arabia

来源：

HELIYON | 2024年 / 10卷 / 04期

关键词：

Attention mechanism; Bidirectional encoder representations from transformer; Decoder; Encoder; Multimodalities; Multimodal abstractive summarization;

D O I：

10.1016/j.heliyon.2024.e26162

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

In recent decades, abstractive text summarization using multimodal input has attracted many researchers due to the capability of gathering information from various sources to create a concise summary. However, the existing methodologies based on multimodal summarization provide only a summary for the short videos and poor results for the lengthy videos. To address the aforementioned issues, this research presented the Multimodal Abstractive Summarization using Bidirectional Encoder Representations from Transformers (MAS-BERT) with an attention mechanism. The purpose of the video summarization is to increase the speed of searching for a large collection of videos so that the users can quickly decide whether the video is relevant or not by reading the summary. Initially, the data is obtained from the publicly available How2 dataset and is encoded using the Bidirectional Gated Recurrent Unit (Bi-GRU) encoder and the Long Short Term Memory (LSTM) encoder. The textual data which is embedded in the embedding layer is encoded using a bidirectional GRU encoder and the features with audio and video data are encoded with LSTM encoder. After this, BERT based attention mechanism is used to combine the modalities and finally, the BI-GRU based decoder is used for summarizing the multimodalities. The results obtained through the experiments that show the proposed MAS-BERT has achieved a better result of 60.2 for Rouge-1 whereas, the existing Decoder-only Multimodal Transformer (DMmT) and the Factorized Multimodal Transformer based Decoder Only Language model (FLORAL) has achieved 49.58 and 56.89 respectively. Our work facilitates users by providing better contextual information and user experience and would help video-sharing platforms for customer retention by allowing users to search for relevant videos by looking at its summary.

引用

页数：11

共 50 条

[21] Smart Contracts Implementation Based on Bidirectional Encoder Representations from Transformers
Aejas, Bajeela
Bouras, Abdelaziz
Belhi, Abdelhak
Gasmi, Houssem
PRODUCT LIFECYCLE MANAGEMENT: GREEN AND BLUE TECHNOLOGIES TO SUPPORT SMART AND SUSTAINABLE ORGANIZATIONS, PT I, 2022, 639 : 293 - 304
[22] Feature Extraction with Bidirectional Encoder Representations from Transformers in Hyperspectral Images
Sigirci, Ibrahim Onur
Ozgur, Hakan
Bilgin, Gokhan
2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,
[23] Identification of Misogyny on Social Media in Indonesian Using Bidirectional Encoder Representations From Transformers (BERT)
Wibowo, Bagas Tri
Nurjanah, Dade
Nurrahmi, Hani
2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 401 - 406
[24] Climate Change Sentiment Analysis Using Domain Specific Bidirectional Encoder Representations From Transformers
Anoop, V. S.
Krishnan, T. K. Ajay
Daud, Ali
Banjar, Ameen
Bukhari, Amal
IEEE ACCESS, 2024, 12 : 114912 - 114922
[25] Transfer Learning for Sentiment Classification Using Bidirectional Encoder Representations from Transformers (BERT) Model
Areshey, Ali
Mathkour, Hassan
SENSORS, 2023, 23 (11)
[26] Using Multilingual Bidirectional Encoder Representations from Transformers on Medical Corpus for Kurdish Text Classification
Badawi, Soran S.
ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY, 2023, 11 (01): : 10 - 15
[27] Predicting Antimalarial Activity in Natural Products Using Pretrained Bidirectional Encoder Representations from Transformers
Nguyen-Vo, Thanh-Hoang
Trinh, Quang H.
Nguyen, Loc
Do, Trang T. T.
Chua, Matthew Chin Heng
Nguyen, Binh P.
JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2022, 62 (21) : 5050 - 5058
[28] Sentiment Analysis of Turkish Drug Reviews with Bidirectional Encoder Representations from Transformers
Bozuyla, Mehmet
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (01)
[29] ADVANCED TURKISH FAKE NEWS PREDICTION WITH BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMERS
Bozuyla, Mehmet
KONYA JOURNAL OF ENGINEERING SCIENCES, 2022, 10 (03): : 750 - 761
[30] Aspect-Level Sentiment Analysis Based on Lite Bidirectional Encoder Representations From Transformers and Graph Attention Networks
Xu, Longming
Xiao, Ping
Zeng, Huixia
JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2025, 34 (02)

← 1 2 3 4 5 →