Deep Convolutional Neural Network for Bidirectional Image-Sentence Mapping

被引:1
|
作者
Yu, Tianyuan [1 ]
Bai, Liang [1 ]
Guo, Jinlin [1 ]
Yang, Zheng [1 ]
Xie, Yuxiang [1 ]
机构
[1] Natl Univ Def Technol, Coll Informat Syst & Management, Changsha 410073, Hunan, Peoples R China
来源
关键词
D O I
10.1007/978-3-319-51814-5_12
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the rapid development of the Internet and the explosion of data volume, it is important to access the cross-media big data including text, image, audio, and video, etc., efficiently and accurately. However, the content heterogeneity and semantic gap make it challenging to retrieve such cross-media archives. The existing approaches try to learn the connection between multiple modalities by direct utilization of hand-crafted low-level features, and the learned correlations are merely constructed with high-level feature representations without considering semantic information. To further exploit the intrinsic structures of multimodal data representations, it is essential to build up an interpretable correlation between these heterogeneous representations. In this paper, a deep model is proposed to first learn the high-level feature representation shared by different modalities like texts and images, with convolutional neural network (CNN). Moreover, the learned CNN features can reflect the salient objects as well as the details in the images and sentences. Experimental results demonstrate that proposed approach outperforms the current state-of-the-art base methods on public dataset of Flickr8K.
引用
收藏
页码:136 / 147
页数:12
相关论文
共 50 条
  • [1] Bidirectional image-sentence retrieval by local and global deep matching
    Ma, Lin
    Jiang, Wenhao
    Jie, Zequn
    Wang, Xu
    [J]. NEUROCOMPUTING, 2019, 345 : 36 - 44
  • [2] Deep Fragment Embeddings for Bidirectional Image Sentence Mapping
    Karpathy, Andrej
    Joulin, Armand
    Li Fei-Fei
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [3] Deep Top-k Ranking for Image-Sentence Matching
    Zhang, Lingling
    Luo, Minnan
    Liu, Jun
    Chang, Xiaojun
    Yang, Yi
    Hauptmann, Alexander G.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (03) : 775 - 785
  • [4] Saliency-Guided Attention Network for Image-Sentence Matching
    Ji, Zhong
    Wang, Haoran
    Han, Jungong
    Pang, Yanwei
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5753 - 5762
  • [5] DEEP CONVOLUTIONAL NEURAL NETWORK FOR MANGROVE MAPPING
    Iovan, Corina
    Kulbicki, Michel
    Mermet, Eric
    [J]. IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 1969 - 1972
  • [6] Deep Convolutional Neural Network for Image Deconvolution
    Xu, Li
    Ren, Jimmy S. J.
    Liu, Ce
    Jia, Jiaya
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [7] Development of Vegetation Mapping with Deep Convolutional Neural Network
    Suh, Sae-Han
    Jhang, Ji-Eun
    Won, Kwanghee
    Shin, Sung-Y.
    Sung, Chang Oan
    [J]. PROCEEDINGS OF THE 2018 CONFERENCE ON RESEARCH IN ADAPTIVE AND CONVERGENT SYSTEMS (RACS 2018), 2018, : 53 - 58
  • [8] Deep Convolutional Neural Network Framework for Subpixel Mapping
    He, Da
    Zhong, Yanfei
    Wang, Xinyu
    Zhang, Liangpei
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (11): : 9518 - 9539
  • [9] Landslide Susceptibility Mapping Using Deep Neural Network and Convolutional Neural Network
    Gong, Sung-Hyun
    Baek, Won-Kyung
    Jung, Hyung-Sup
    [J]. KOREAN JOURNAL OF REMOTE SENSING, 2022, 38 (06) : 1723 - 1735
  • [10] PSYCHOPHYSIOLOGICAL STUDIES ON PARADIGM OF IMAGE-SENTENCE COMPARISON
    KLIX, F
    REBENTISCH, E
    [J]. ZEITSCHRIFT FUR PSYCHOLOGIE, 1976, 184 (03): : 445 - 449