Look Deeper See Richer: Depth-aware Image Paragraph Captioning

被引:20
|
作者
Wang, Ziwei [1 ]
Luo, Yadan [1 ]
Li, Yang [1 ]
Huang, Zi [1 ]
Yin, Hongzhi [1 ]
机构
[1] Univ Queensland, Sch Informat Technol & Elect Engn, Brisbane, Qld, Australia
基金
澳大利亚研究理事会;
关键词
Paragraph Captioning; Depth Estimation; Attention Mechanism;
D O I
10.1145/3240508.3240583
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the widespread availability of image captioning at a sentence level, how to automatically generate image paragraphs is yet well explored. Describing an image by a full paragraph involves organising sentences orderly, coherently and diversely, inevitably leading higher complexity than by a single sentence. Existing image paragraph captioning methods give a series of sentences to represent the objects and regions of interests, where the descriptions are essentially generated by feeding the image fragments containing objects and regions into conventional image single-sentence captioning models. This strategy is difficult to generate the descriptions that guarantee the stereoscopic hierarchy and non-overlapping objects. In this paper, we propose a Depth-aware Attention Model (DAM) to generate paragraph captions for images. The depths of image areas are firstly estimated in order to discriminate objects in a range of spatial locations, which can further guide the linguistic decoder to reveal spatial relationships among objects. This model completes the paragraph in a logical and coherent manner. By incorporating the attention mechanism, the learned model swiftly shifts the sentence focus during paragraph generation, whilst avoiding verbose descriptions on a same object. Extensive quantitative experiments and the user study have been conducted on the Visual Genome dataset, which demonstrate the effectiveness and the interpretability of the proposed model.
引用
收藏
页码:672 / 680
页数:9
相关论文
共 21 条
  • [1] Depth-Aware Image Seam Carving
    Shen, Jianbing
    Wang, Dapeng
    Li, Xuelong
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (05) : 1453 - 1461
  • [2] Depth-Aware Image Colorization Network
    Chu, Wei-Ta
    Hsu, Yu-Ting
    [J]. PROCEEDINGS OF THE 2018 WORKSHOP ON UNDERSTANDING SUBJECTIVE ATTRIBUTES OF DATA, WITH THE FOCUS ON EVOKED EMOTIONS (EE-USAD'18), 2018, : 17 - 23
  • [3] Depth-aware image vectorization and editing
    Shufang Lu
    Wei Jiang
    Xuefeng Ding
    Craig S. Kaplan
    Xiaogang Jin
    Fei Gao
    Jiazhou Chen
    [J]. The Visual Computer, 2019, 35 : 1027 - 1039
  • [4] Depth-aware image vectorization and editing
    Lu, Shufang
    Jiang, Wei
    Ding, Xuefeng
    Kaplan, Craig S.
    Jin, Xiaogang
    Gao, Fei
    Chen, Jiazhou
    [J]. VISUAL COMPUTER, 2019, 35 (6-8): : 1027 - 1039
  • [5] Interactive Depth-Aware Effects for Stereo Image Editing
    Abbott, Joshua
    Morse, Bryan
    [J]. 2013 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2013), 2013, : 263 - 270
  • [6] Deep Image Registration With Depth-Aware Homography Estimation
    Huang, Chenwei
    Pan, Xiong
    Cheng, Jingchun
    Song, Jiajie
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 6 - 10
  • [7] Learning depth-aware decomposition for single image dehazing
    Kang, Yumeng
    Zhang, Lu
    Hu, Ping
    Liu, Yu
    Lu, Huchuan
    He, You
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 248
  • [8] REINFORCED DEPTH-AWARE DEEP LEARNING FOR SINGLE IMAGE DEHAZING
    Guo, Tiantong
    Monga, Vishal
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8891 - 8895
  • [9] Salient object segmentation based on depth-aware image layering
    Du, Huan
    Liu, Zhi
    Shi, Ran
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (09) : 12125 - 12138
  • [10] Depth-aware total variation regularization for underwater image dehazing
    Ding, Xueyan
    Liang, Zheng
    Wang, Yafei
    Fu, Xianping
    [J]. SIGNAL PROCESSING-IMAGE COMMUNICATION, 2021, 98