Multimodal object description network for dense captioning

被引：2

作者：

Wang, Weixuan ^{[1
]}

Hu, Haifeng ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou 510006, Guangdong, Peoples R China

来源：

ELECTRONICS LETTERS | 2017年 / 53卷 / 15期

关键词：

D O I：

10.1049/el.2017.0326

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A new multimodal object description network (MODN) model for dense captioning is proposed. The proposed model is constructed by using a vision module and a language module. As for vision module, the modified faster regions-convolution neural network (R-CNN) is used to detect the salient objects and extract their inherited features. The language module combines the semantics features with the object features obtained from the vision module and calculate the probability distribution of each word in the sentence. Compared with existing methods, a multimodal layer in the proposed MODN framework is adopted which can effectively extract discriminant information from both object and semantic features. Moreover, MODN can generate object description rapidly without external region proposal. The effectiveness of MODN on the famous VOC2007 dataset and Visual Genome dataset is verified.

引用

页码：1041 / +

页数：2

共 50 条

[41] Context and Attribute Grounded Dense Captioning
Yin, Guojun
Sheng, Lu
Liu, Bin
Yu, Nenghai
Wang, Xiaogang
Shao, Jing
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6234 - 6243
[42] An Efficient Framework for Dense Video Captioning
Suin, Maitreya
Rajagopalan, A. N.
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12039 - 12046
[43] Image Graph Production by Dense Captioning
Sahba, Amin
Das, Arun
Rad, Paul
Jamshidi, Mo
2018 WORLD AUTOMATION CONGRESS (WAC), 2018, : 193 - 198
[44] MMT: A Multimodal Translator for Image Captioning
Liu, Chang
Sun, Fuchun
Wang, Changhu
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, PT II, 2017, 10614 : 784 - 784
[45] Multimodal Image Captioning for Marketing Analysis
Harzig, Philipp
Brehm, Stephan
Lienhart, Rainer
Kaiser, Carolin
Schallner, Rene
IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 158 - 161
[46] Multimodal Feature Learning for Video Captioning
Lee, Sujin
Kim, Incheol
MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
[47] Improving multimodal datasets with image captioning
Thao Nguyen
Gadre, Samir Yitzhak
Ilharco, Gabriel
Oh, Sewoong
Schmidt, Ludwig
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[48] Deep multimodal embedding for video captioning
Lee, Jin Young
MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (22) : 31793 - 31805
[49] A multimodal fusion approach for image captioning
Zhao, Dexin
Chang, Zhi
Guo, Shutao
NEUROCOMPUTING, 2019, 329 : 476 - 485
[50] Dense-Captioning Events in Videos
Krishna, Ranjay
Hata, Kenji
Ren, Frederic
Fei-Fei, Li
Niebles, Juan Carlos
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 706 - 715

← 1 2 3 4 5 →