Style-Enhanced Transformer for Image Captioning in Construction Scenes

被引:0
|
作者
Song, Kani [1 ]
Chen, Linlin [1 ]
Wang, Hengyou [1 ]
机构
[1] Beijing Univ Civil Engn & Architecture, Sch Sci, Beijing 100044, Peoples R China
基金
中国国家自然科学基金;
关键词
image captioning; construction scene; style feature; transformer;
D O I
10.3390/e26030224
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Image captioning is important for improving the intelligence of construction projects and assisting managers in mastering construction site activities. However, there are few image-captioning models for construction scenes at present, and the existing methods do not perform well in complex construction scenes. According to the characteristics of construction scenes, we label a text description dataset based on the MOCS dataset and propose a style-enhanced Transformer for image captioning in construction scenes, simply called SETCAP. Specifically, we extract the grid features using the Swin Transformer. Then, to enhance the style information, we not only use the grid features as the initial detail semantic features but also extract style information by style encoder. In addition, in the decoder, we integrate the style information into the text features. The interaction between the image semantic information and the text features is carried out to generate content-appropriate sentences word by word. Finally, we add the sentence style loss into the total loss function to make the style of generated sentences closer to the training set. The experimental results show that the proposed method achieves encouraging results on both the MSCOCO and the MOCS datasets. In particular, SETCAP outperforms state-of-the-art methods by 4.2% CIDEr scores on the MOCS dataset and 3.9% CIDEr scores on the MSCOCO dataset, respectively.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Manifesting construction activity scenes via image captioning
    Liu, Huan
    Wang, Guangbin
    Huang, Ting
    He, Ping
    Skitmore, Martin
    Luo, Xiaochun
    [J]. AUTOMATION IN CONSTRUCTION, 2020, 119
  • [2] Dual Global Enhanced Transformer for image captioning
    Xian, Tiantao
    Li, Zhixin
    Zhang, Canlong
    Ma, Huifang
    [J]. NEURAL NETWORKS, 2022, 148 : 129 - 141
  • [3] Input enhanced asymmetric transformer for image captioning
    Chenhao Zhu
    Xia Ye
    Qiduo Lu
    [J]. Signal, Image and Video Processing, 2023, 17 : 1419 - 1427
  • [4] Input enhanced asymmetric transformer for image captioning
    Zhu, Chenhao
    Ye, Xia
    Lu, Qiduo
    [J]. SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 1419 - 1427
  • [5] Relational Attention with Textual Enhanced Transformer for Image Captioning
    Song, Lifei
    Shi, Yiwen
    Xiao, Xinyu
    Zhang, Chunxia
    Xiang, Shiming
    [J]. PATTERN RECOGNITION AND COMPUTER VISION,, PT III, 2021, 13021 : 151 - 163
  • [6] Adaptive Semantic-Enhanced Transformer for Image Captioning
    Zhang, Jing
    Fang, Zhongjun
    Sun, Han
    Wang, Zhe
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 1785 - 1796
  • [7] Triple-level relationship enhanced transformer for image captioning
    Zheng, Anqi
    Zheng, Shiqi
    Bai, Cong
    Chen, Deng
    [J]. MULTIMEDIA SYSTEMS, 2023, 29 (04) : 1955 - 1966
  • [8] Triple-level relationship enhanced transformer for image captioning
    Anqi Zheng
    Shiqi Zheng
    Cong Bai
    Deng Chen
    [J]. Multimedia Systems, 2023, 29 : 1955 - 1966
  • [9] Distance Transformer for Image Captioning
    Wang, Jiarong
    Lu, Tongwei
    Liu, Xuanxuan
    Yang, Qi
    [J]. 2021 4TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION ENGINEERING (RCAE 2021), 2021, : 73 - 76
  • [10] Rotary Transformer for Image Captioning
    Qiu, Yile
    Zhu, Li
    [J]. SECOND INTERNATIONAL CONFERENCE ON OPTICS AND IMAGE PROCESSING (ICOIP 2022), 2022, 12328