Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning

被引:74
|
作者
Wang, Cheng [1 ]
Yang, Haojin [1 ]
Meinel, Christoph [1 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, Prof Dr Helmert Str 2-3, D-14482 Potsdam, Germany
关键词
Deep learning; LSTM; multimodal representations; image captioning; mutli-task learning;
D O I
10.1145/3115432
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generating a novel and descriptive caption of an image is drawing increasing interests in computer vision, natural language processing, and multimedia communities. In this work, we propose an end-to-end trainable deep bidirectional LSTM (Bi-LSTM (Long Short-Term Memory)) model to address the problem. By combining a deep convolutional neural network (CNN) and two separate LSTM networks, our model is capable of learning long-term visual-language interactions by making use of history and future context information at high-level semantic space. We also explore deep multimodal bidirectional models, in which we increase the depth of nonlinearity transition in different ways to learn hierarchical visual-language embeddings. Data augmentation techniques such as multi-crop, multi-scale, and vertical mirror are proposed to prevent over-fitting in training deep models. To understand how our models "translate" image to sentence, we visualize and qualitatively analyze the evolution of Bi-LSTM internal states over time. The effectiveness and generality of proposed models are evaluated on four benchmark datasets: Flickr8K, Flickr30K, MSCOCO, and Pascal1K datasets. We demonstrate that Bi-LSTM models achieve highly competitive performance on both caption generation and image-sentence retrieval even without integrating an additional mechanism (e.g., object detection, attention model). Our experiments also prove that multi-task learning is beneficial to increase model generality and gain performance. We also demonstrate the performance of transfer learning of the Bi-LSTM model significantly outperforms previous methods on the Pascal1K dataset.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Multi-task Learning with Bidirectional Language Models for Text Classification
    Yang, Qi
    Shang, Lin
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [32] Bidirectional Domain Adaptation Using Weighted Multi-Task Learning
    Dakota, Daniel
    Sayyed, Zeeshan Ali
    Kuebler, Sandra
    IWPT 2021: THE 17TH INTERNATIONAL CONFERENCE ON PARSING TECHNOLOGIES: PROCEEDINGS OF THE CONFERENCE (INCLUDING THE IWPT 2021 SHARED TASK), 2021, : 93 - 105
  • [33] Deep correlation mining for multi-task image clustering
    Yan, Xiaoqiang
    Shi, Kaiyuan
    Ye, Yangdong
    Yu, Hui
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 187
  • [34] Deep Learning-Based Image Geolocation for Travel Recommendation via Multi-Task Learning
    Gu, Fangfang
    Jiang, Keshen
    Hu, Xiaoyi
    Yang, Jie
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 2022, 31 (07)
  • [35] Adversarial Learning Guided Task Relatedness Refinement for Multi-Task Deep Learning
    Fang, Yuchun
    Cai, Sirui
    Cao, Yiting
    Li, Zhengchen
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 6946 - 6957
  • [36] Optimization of Deep Reinforcement Learning with Hybrid Multi-Task Learning
    Varghese, Nelson Vithayathil
    Mahmoud, Qusay H.
    2021 15TH ANNUAL IEEE INTERNATIONAL SYSTEMS CONFERENCE (SYSCON 2021), 2021,
  • [37] Improving Evidential Deep Learning via Multi-Task Learning
    Oh, Dongpin
    Shin, Bonggun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 7895 - 7903
  • [38] Multi-task gradient descent for multi-task learning
    Bai, Lu
    Ong, Yew-Soon
    He, Tiantian
    Gupta, Abhishek
    MEMETIC COMPUTING, 2020, 12 (04) : 355 - 369
  • [39] Multi-task gradient descent for multi-task learning
    Lu Bai
    Yew-Soon Ong
    Tiantian He
    Abhishek Gupta
    Memetic Computing, 2020, 12 : 355 - 369
  • [40] Bacterial image analysis using multi-task deep learning approaches for clinical microscopy
    Chin, Shuang Yee
    Dong, Jian
    Hasikin, Khairunnisa
    Ngui, Romano
    Lai, Khin Wee
    Yeoh, Pauline Shan Qing
    Wu, Xiang
    PEERJ COMPUTER SCIENCE, 2024, 10