Look Deeper See Richer: Depth-aware Image Paragraph Captioning

被引：20

作者：

Wang, Ziwei ^{[1
]}

Luo, Yadan ^{[1
]}

Li, Yang ^{[1
]}

Huang, Zi ^{[1
]}

Yin, Hongzhi ^{[1
]}

机构：

[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia

来源：

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18) | 2018年

基金：

澳大利亚研究理事会;

关键词：

Paragraph Captioning; Depth Estimation; Attention Mechanism;

D O I：

10.1145/3240508.3240583

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

With the widespread availability of image captioning at a sentence level, how to automatically generate image paragraphs is yet well explored. Describing an image by a full paragraph involves organising sentences orderly, coherently and diversely, inevitably leading higher complexity than by a single sentence. Existing image paragraph captioning methods give a series of sentences to represent the objects and regions of interests, where the descriptions are essentially generated by feeding the image fragments containing objects and regions into conventional image single-sentence captioning models. This strategy is difficult to generate the descriptions that guarantee the stereoscopic hierarchy and non-overlapping objects. In this paper, we propose a Depth-aware Attention Model (DAM) to generate paragraph captions for images. The depths of image areas are firstly estimated in order to discriminate objects in a range of spatial locations, which can further guide the linguistic decoder to reveal spatial relationships among objects. This model completes the paragraph in a logical and coherent manner. By incorporating the attention mechanism, the learned model swiftly shifts the sentence focus during paragraph generation, whilst avoiding verbose descriptions on a same object. Extensive quantitative experiments and the user study have been conducted on the Visual Genome dataset, which demonstrate the effectiveness and the interpretability of the proposed model.

引用

页码：672 / 680

页数：9

共 21 条

[1] Depth-Aware Image Seam Carving
Shen, Jianbing
Wang, Dapeng
Li, Xuelong
[J]. IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (05) : 1453 - 1461
[2] Depth-Aware Image Colorization Network
Chu, Wei-Ta
Hsu, Yu-Ting
[J]. PROCEEDINGS OF THE 2018 WORKSHOP ON UNDERSTANDING SUBJECTIVE ATTRIBUTES OF DATA, WITH THE FOCUS ON EVOKED EMOTIONS (EE-USAD'18), 2018, : 17 - 23
[3] Depth-aware image vectorization and editing
Shufang Lu
Wei Jiang
Xuefeng Ding
Craig S. Kaplan
Xiaogang Jin
Fei Gao
Jiazhou Chen
[J]. The Visual Computer, 2019, 35 : 1027 - 1039
[4] Depth-aware image vectorization and editing
Lu, Shufang
Jiang, Wei
Ding, Xuefeng
Kaplan, Craig S.
Jin, Xiaogang
Gao, Fei
Chen, Jiazhou
[J]. VISUAL COMPUTER, 2019, 35 (6-8): : 1027 - 1039
[5] Interactive Depth-Aware Effects for Stereo Image Editing
Abbott, Joshua
Morse, Bryan
[J]. 2013 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2013), 2013, : 263 - 270
[6] Deep Image Registration With Depth-Aware Homography Estimation
Huang, Chenwei
Pan, Xiong
Cheng, Jingchun
Song, Jiajie
[J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 6 - 10
[7] Learning depth-aware decomposition for single image dehazing
Kang, Yumeng
Zhang, Lu
Hu, Ping
Liu, Yu
Lu, Huchuan
He, You
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 248
[8] REINFORCED DEPTH-AWARE DEEP LEARNING FOR SINGLE IMAGE DEHAZING
Guo, Tiantong
Monga, Vishal
[J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8891 - 8895
[9] Salient object segmentation based on depth-aware image layering
Du, Huan
Liu, Zhi
Shi, Ran
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (09) : 12125 - 12138
[10] Depth-aware total variation regularization for underwater image dehazing
Ding, Xueyan
Liang, Zheng
Wang, Yafei
Fu, Xianping
[J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 98

← 1 2 3 →