Multimodal object description network for dense captioning

被引：2

作者：

Wang, Weixuan ^{[1
]}

Hu, Haifeng ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou 510006, Guangdong, Peoples R China

来源：

ELECTRONICS LETTERS | 2017年 / 53卷 / 15期

关键词：

D O I：

10.1049/el.2017.0326

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A new multimodal object description network (MODN) model for dense captioning is proposed. The proposed model is constructed by using a vision module and a language module. As for vision module, the modified faster regions-convolution neural network (R-CNN) is used to detect the salient objects and extract their inherited features. The language module combines the semantics features with the object features obtained from the vision module and calculate the probability distribution of each word in the sentence. Compared with existing methods, a multimodal layer in the proposed MODN framework is adopted which can effectively extract discriminant information from both object and semantic features. Moreover, MODN can generate object description rapidly without external region proposal. The effectiveness of MODN on the famous VOC2007 dataset and Visual Genome dataset is verified.

引用

页码：1041 / +

页数：2

共 50 条

[21] Hierarchical Context-aware Network for Dense Video Event Captioning
Ji, Lei
Guo, Xianglin
Huang, Haoyang
Chen, Xilin
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2004 - 2013
[22] Scene-Graph-Guided message passing network for dense captioning
Liu, An-An
Wang, Yanhui
Xu, Ning
Liu, Shan
Li, Xuanya
PATTERN RECOGNITION LETTERS, 2021, 145 : 187 - 193
[23] A Hierarchical Multimodal Attention-based Neural Network for Image Captioning
Cheng, Yong
Huang, Fei
Zhou, Lian
Jin, Cheng
Zhang, Yuejie
Zhang, Tao
SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 889 - 892
[24] Multimodal Deep Neural Network with Image Sequence Features for Video Captioning
Oura, Soichiro
Matsukawa, Tetsu
Suzuki, Einoshin
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[25] Survey of Dense Video Captioning
Huang, Xiankai
Zhang, Jiayu
Wang, Xinyu
Wang, Xiaochuan
Liu, Ruijun
Computer Engineering and Applications, 2023, 59 (12): : 28 - 48
[26] Multirate Multimodal Video Captioning
Yang, Ziwei
Xu, Youjiang
Wang, Huiyun
Wang, Bo
Han, Yahong
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1877 - 1882
[27] Dense Image Captioning in Hindi
Gill, Karanjit
Saha, Sriparna
Mishra, Santosh Kumar
2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 2894 - 2899
[28] Streamlined Dense Video Captioning
Mun, Jonghwan
Yang, Linjie
Ren, Zhou
Xu, Ning
Han, Bohyung
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 3581 - +
[29] MIVCN: Multimodal interaction video captioning network based on semantic association graph
Ying Wang
Guoheng Huang
Lin Yuming
Haoliang Yuan
Chi-Man Pun
Wing-Kuen Ling
Lianglun Cheng
Applied Intelligence, 2022, 52 : 5241 - 5260
[30] MIVCN: Multimodal interaction video captioning network based on semantic association graph
Wang, Ying
Huang, Guoheng
Lin Yuming
Yuan, Haoliang
Pun, Chi-Man
Ling, Wing-Kuen
Cheng, Lianglun
APPLIED INTELLIGENCE, 2022, 52 (05) : 5241 - 5260

← 1 2 3 4 5 →