Deep Learning for Video Captioning: A Review

被引：0

作者：

Chen, Shaoxiang ^{[1
]}

Yao, Ting ^{[3
]}

Jiang, Yu-Gang ^{[1
,2
]}

机构：

[1] Fudan Univ, Sch Comp Sci, Shanghai Key Lab Intelligent Info Proc, Shanghai, Peoples R China

[2] Jilian Technol Grp Video, Shanghai, Peoples R China

[3] JD AI Res, Shanghai, Peoples R China

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2019年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep learning has achieved great successes in solving specific artificial intelligence problems recently. Substantial progresses are made on Computer Vision (CV) and Natural Language Processing (NLP). As a connection between the two worlds of vision and language, video captioning is the task of producing a natural-language utterance (usually a sentence) that describes the visual content of a video. The task is naturally decomposed into two sub-tasks. One is to encode a video via a thorough understanding and learn visual representation. The other is caption generation, which decodes the learned representation into a sequential sentence, word by word. In this survey, we first formulate the problem of video captioning, then review state-of-the-art methods categorized by their emphasis on vision or language, and followed by a summary of standard datasets and representative approaches. Finally, we highlight the challenges which are not yet fully understood in this task and present future research directions.

引用

页码：6283 / 6290

页数：8

共 50 条

[31] A detailed review of prevailing image captioning methods using deep learning techniques
Deorukhkar, Kalpana
Ket, Satish
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (01) : 1313 - 1336
[32] A detailed review of prevailing image captioning methods using deep learning techniques
Kalpana Deorukhkar
Satish Ket
[J]. Multimedia Tools and Applications, 2022, 81 : 1313 - 1336
[33] A Review on Deep Learning Techniques for Video Prediction
Oprea, Sergiu
Martinez-Gonzalez, Pablo
Garcia-Garcia, Alberto
Castro-Vargas, John Alejandro
Orts-Escolano, Sergio
Garcia-Rodriguez, Jose
Argyros, Antonis
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (06) : 2806 - 2826
[34] Deep learning for video object segmentation: a review
Mingqi Gao
Feng Zheng
James J. Q. Yu
Caifeng Shan
Guiguang Ding
Jungong Han
[J]. Artificial Intelligence Review, 2023, 56 : 457 - 531
[35] Deep learning for video object segmentation: a review
Gao, Mingqi
Zheng, Feng
Yu, James J. Q.
Shan, Caifeng
Ding, Guiguang
Han, Jungong
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (01) : 457 - 531
[36] Video captioning: a review of theory, techniques and practices
Jain, Vanita
Al-Turjman, Fadi
Chaudhary, Gopal
Nayar, Devang
Gupta, Varun
Kumar, Aayush
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (25) : 35619 - 35653
[37] A Comprehensive Survey of Deep Learning for Image Captioning
Hossain, Md Zakir
Sohel, Ferdous
Shiratuddin, Mohd Fairuz
Laga, Hamid
[J]. ACM COMPUTING SURVEYS, 2019, 51 (06)
[38] Facilitated Deep Learning Models for Image Captioning
Azhar, Imtinan
Afyouni, Imad
Elnagar, Ashraf
[J]. 2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[39] Video captioning based on vision transformer and reinforcement learning
Zhao, Hong
Chen, Zhiwen
Guo, Lan
Han, Zeyu
[J]. PeerJ Computer Science, 2022, 8
[40] Video captioning based on vision transformer and reinforcement learning
Zhao, Hong
Chen, Zhiwen
Guo, Lan
Han, Zeyu
[J]. PEERJ COMPUTER SCIENCE, 2022, 8

← 1 2 3 4 5 →