Traffic Scene Captioning with Multi-Stage Feature Enhancement

被引：1

作者：

Zhang, Dehai ^{[1
]}

Ma, Yu ^{[1
]}

Liu, Qing ^{[1
]}

Wang, Haoxing ^{[1
]}

Ren, Anquan ^{[1
]}

Liang, Jiashu ^{[1
]}

机构：

[1] Yunnan Univ, Sch Software, Kunming 650091, Peoples R China

来源：

CMC-COMPUTERS MATERIALS & CONTINUA | 2023年 / 76卷 / 03期

关键词：

Traffic scene captioning; sustainable transportation; feature enhancement; encoder-decoder structure; multi-level granularity; scene knowledge graph;

D O I：

10.32604/cmc.2023.038264

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Traffic scene captioning technology automatically generates one or more sentences to describe the content of traffic scenes by analyzing the content of the input traffic scene images, ensuring road safety while providing an important decision-making function for sustainable transportation. In order to provide a comprehensive and reasonable description of complex traffic scenes, a traffic scene semantic captioning model with multi-stage feature enhancement is proposed in this paper. In general, the model follows an encoder-decoder structure. First, multilevel granularity visual features are used for feature enhancement during the encoding process, which enables the model to learn more detailed content in the traffic scene image. Second, the scene knowledge graph is applied to the decoding process, and the semantic features provided by the scene knowledge graph are used to enhance the features learned by the decoder again, so that the model can learn the attributes of objects in the traffic scene and the relationships between objects to generate more reasonable captions. This paper reports extensive experiments on the challenging MS-COCO dataset, evaluated by five standard automatic evaluation metrics, and the results show that the proposed model has improved significantly in all metrics compared with the state-of-the-art methods, especially achieving a score of 129.0 on the CIDEr-D evaluation metric, which also indicates that the proposed model can effectively provide a more reasonable and comprehensive description of the traffic scene.

引用

页码：2901 / 2920

页数：20

共 50 条

[1] Swin-Caption: Swin Transformer-Based Image Captioning with Feature Enhancement and Multi-Stage Fusion
Liu, Lei
Jiao, Yidi
Li, Xiaoran
Li, Jing
Wang, Haitao
Cao, Xinyu
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE AND APPLICATIONS, 2024,
[2] Multi-Stage Feature Interaction Model with Abundant Semantic Information for Image Captioning
Li, Xueting
An, Gaoyun
Ruan, Qiuqi
PROCEEDINGS OF 2020 IEEE 15TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP 2020), 2020, : 407 - 410
[3] MuSeFFF: Multi-stage feature fusion framework for traffic prediction
Kumar A.
Sunitha R.
Intelligent Systems with Applications, 2023, 18
[4] Multi-Stage Multi-Task Feature Learning
Gong, Pinghua
Ye, Jieping
Zhang, Changshui
JOURNAL OF MACHINE LEARNING RESEARCH, 2013, 14 : 2979 - 3010
[5] Multi-stage multi-task feature learning
Gong, Pinghua
Ye, Jieping
Zhang, Changshui
Journal of Machine Learning Research, 2013, 14 : 2979 - 3010
[6] CNN Pruning with Multi-Stage Feature Decorrelation
Zhu, Qiuyu
Liu, Chengfei
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (15)
[7] Multi-stage convex relaxation for feature selection
Zhang, Tong
BERNOULLI, 2013, 19 (5B) : 2277 - 2293
[8] Multi-stage Feature Selection for On-Line Flow Peer-to-Peer Traffic Identification
Abdalla, Bushra Mohammed Ali
Jamil, Haitham A.
Hamdan, Mosab
Bassi, Joseph Stephen
Ismail, Ismahani
Marsono, Muhammad Nadzir
MODELING, DESIGN AND SIMULATION OF SYSTEMS, ASIASIM 2017, PT II, 2017, 752 : 509 - 523
[9] Multi-stage Progressive Speech Enhancement Network
Xu, Xinmeng
Wang, Yang
Xu, Dongxiang
Peng, Yiyuan
Zhang, Cong
Jia, Jie
Chen, Binbin
INTERSPEECH 2021, 2021, : 2691 - 2695
[10] Multi-Stage Feature Enhancement Pyramid Network for Detecting Objects in Optical Remote Sensing Images
Zhang, Kaihua
Shen, Haikuo
REMOTE SENSING, 2022, 14 (03)

← 1 2 3 4 5 →