Bilingual video captioning model for enhanced video retrieval

被引：1

作者：

Alrebdi, Norah ^{[1
]}

Al-Shargabi, Amal A. ^{[1
]}

机构：

[1] Qassim Univ, Coll Comp, Dept Informat Technol, Buraydah 51452, Saudi Arabia

来源：

JOURNAL OF BIG DATA | 2024年 / 11卷 / 01期

关键词：

Artificial intelligence; Computer vision; Natural language processing; Video retrieval; English video captioning; Arabic video captioning; LANGUAGE; NETWORK; VISION; TEXT;

D O I：

10.1186/s40537-024-00878-w

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Many video platforms rely on the descriptions that uploaders provide for video retrieval. However, this reliance may cause inaccuracies. Although deep learning-based video captioning can resolve this problem, it has some limitations: (1) traditional keyframe extraction techniques do not consider video length/content, resulting in low accuracy, high storage requirements, and long processing times; (2) Arabic language support in video captioning is not extensive. This study proposes a new video captioning approach that uses an efficient keyframe extraction method and supports both Arabic and English. The proposed keyframe extraction technique uses time- and content-based approaches for better quality captions, fewer storage space requirements, and faster processing. The English and Arabic models use a sequence-to-sequence framework with long short-term memory in both the encoder and decoder. Both models were evaluated on caption quality using four metrics: bilingual evaluation understudy (BLEU), metric for evaluation of translation with explicit ORdering (METEOR), recall-oriented understudy of gisting evaluation (ROUGE-L), and consensus-based image description evaluation (CIDE-r). They were also evaluated using cosine similarity to determine their suitability for video retrieval. The results demonstrated that the English model performed better with regards to caption quality and video retrieval. In terms of BLEU, METEOR, ROUGE-L, and CIDE-r, the English model scored 47.18, 30.46, 62.07, and 59.98, respectively, whereas the Arabic model scored 21.65, 36.30, 44.897, and 45.52, respectively. According to the video retrieval, the English and Arabic models successfully retrieved 67% and 40% of the videos, respectively, with 20% similarity. These models have potential applications in storytelling, sports commentaries, and video surveillance.

引用

页数：24

共 50 条

[31] Multirate Multimodal Video Captioning
Yang, Ziwei
Xu, Youjiang
Wang, Huiyun
Wang, Bo
Han, Yahong
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1877 - 1882
[32] Survey of Dense Video Captioning
Huang, Xiankai
Zhang, Jiayu
Wang, Xinyu
Wang, Xiaochuan
Liu, Ruijun
Computer Engineering and Applications, 2023, 59 (12): : 28 - 48
[33] Video Captioning with Tube Features
Zhao, Bin
Li, Xuelong
Lu, Xiaoqiang
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 1177 - 1183
[34] Thinking Hallucination for Video Captioning
Ullah, Nasib
Mohanta, Partha Pratim
COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 623 - 640
[35] Video Captioning by Adversarial LSTM
Yang, Yang
Zhou, Jie
Ai, Jiangbo
Bin, Yi
Hanjalic, Alan
Shen, Heng Tao
Ji, Yanli
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (11) : 5600 - 5611
[36] Reconstruction Network for Video Captioning
Wang, Bairui
Ma, Lin
Zhang, Wei
Liu, Wei
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7622 - 7631
[37] Video Captioning with Semantic Guiding
Yuan, Jin
Tian, Chunna
Zhang, Xiangnan
Ding, Yuxuan
Wei, Wei
2018 IEEE FOURTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM), 2018,
[38] A semantic model for video description and retrieval
Lin, CH
Lee, AHC
Chen, ALP
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2002, PROCEEDING, 2002, 2532 : 183 - 190
[39] Semantic Enhanced Encoder-Decoder Network (SEN) for Video Captioning
Gui, Yuling
Guo, Dan
Zhao, Ye
PROCEEDINGS OF THE 2ND WORKSHOP ON MULTIMEDIA FOR ACCESSIBLE HUMAN COMPUTER INTERFACES (MAHCI '19), 2019, : 25 - 32
[40] Video Captioning based on Image Captioning as Subsidiary Content
Vaishnavi, J.
Narmatha, V
2022 SECOND INTERNATIONAL CONFERENCE ON ADVANCES IN ELECTRICAL, COMPUTING, COMMUNICATION AND SUSTAINABLE TECHNOLOGIES (ICAECT), 2022,

← 1 2 3 4 5 →