Real-time urban street view semantic segmentation based on cross-layer aggregation network

被引:0
|
作者
Hou Z. [1 ,2 ]
Cheng M. [1 ,2 ]
Ma S. [1 ,2 ]
Qu M. [1 ,2 ]
Yang X. [1 ,2 ]
机构
[1] Xi′an University of Posts and Telecommunications, Institute of Computer, Xi′an
[2] Xi′an University of Posts and Telecommunications, Shaanxi Key Laboratory of Network Data Analysis and Intelligent Processing, Xi′an
关键词
convolutional neural network; encoder-decoder structure; pyramid pooling module; semantic segmentation; urban street view;
D O I
10.37188/OPE.20243208.1212
中图分类号
学科分类号
摘要
With the rapid development of autonomous driving technology,precise and efficient scene un⁃ derstanding has become increasingly important. Urban street scene semantic segmentation aims to accu⁃ rately identify and segment elements such as pedestrians,obstacles,roads,and signs,providing necessary road information for autonomous driving technology. However,current semantic segmentation algorithms still face challenges in urban street scene segmentation,mainly manifested in issues such as insufficient dis⁃ crimination between different categories of pixels,inaccurate understanding of complex scene structures, and inaccurate segmentation of small-scale objects or large-scale structures. To address these issues,this paper proposed a real-time urban street scene semantic segmentation algorithm based on a cross-layer ag⁃ gregation network. Firstly,a pyramid pooling module combined with cross-layer aggregation was de⁃ signed at the end of the encoder to efficiently extract multi-scale context information. Secondly,a cross-layer aggregation module was designed between the encoder and decoder,which enhances the representa⁃ tion ability of information by introducing a channel attention mechanism and gradually aggregates the fea⁃ tures of the encoder stage to fully achieve feature reuse. Finally,a multi-scale fusion module was designed in the decoder stage,which aggregates global and local information in the channel dimension to promote the fusion of deep and shallow features. The proposed algorithm was validated on two common urban street scene datasets. On an RTX 3090 graphics card(TensorRT speed measurement environment),the algorithm achieves 73. 0% mIoU accuracy on the Cityscapes test set with real-time performance of 294 FPS,and 75. 8% mIoU accuracy on higher resolution images with real-time performance of 164 FPS;on the CamVid dataset,it achieves 74. 8% mIoU accuracy with real-time performance of 239 FPS. Experi⁃ mental results show that the proposed algorithm effectively balances accuracy and real-time performance, significantly improving semantic segmentation performance compared to other algorithms,and bringing new breakthroughs to the field of real-time urban street scene semantic segmentation. © 2024 Chinese Academy of Sciences. All rights reserved.
引用
收藏
页码:1212 / 1226
页数:14
相关论文
共 34 条
  • [1] LONG J, SHELHAMER E, DARRELL T., Fully convolutional networks for semantic segmentation [C], Proceedings of the IEEE conference on com⁃ puter vision and pattern recognition, pp. 3431-3440, (2015)
  • [2] ZHAO H S, SHI J P, QI X J, Et al., Pyramid scene parsing network, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6230-6239, (2017)
  • [3] LI X T, YOU A S, ZHU Z, Et al., Semantic Flow for Fast and Accurate Scene Parsing[M], Comput⁃ er Vision – ECCV 2020, pp. 775-793, (2020)
  • [4] REN F L, YANG L, ZHOU H B, Et al., Real-time semantic segmentation based on improved BiSeNet [J], Opt. Precision Eng, 31, 8, pp. 1217-1227, (2023)
  • [5] PASZKE A, CHAURASIA A, KIM S, Et al., EN⁃ et:a deep neural network architecture for real-time semantic segmentation, (2016)
  • [6] ZHAO H S, QI X J, SHEN X Y, Et al., ICNet for Real-Time Semantic Segmentation on High-Resolu⁃ tion Images, Computer Vision – ECCV 2018, pp. 418-434, (2018)
  • [7] LI H C, XIONG P F, FAN H Q, Et al., DFANet: deep feature aggregation for real-time semantic seg⁃ mentation, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9514-9523, (2019)
  • [8] YU C Q, WANG J B, PENG C, Et al., BiSeNet: bilateral segmentation network for real-time seman⁃ tic segmentation [M], Computer Vision-ECCV 2018, pp. 334-349, (2018)
  • [9] FAN M Y, LAI S Q, HUANG J S, Et al., Rethink⁃ ing BiSeNet for Real-Time semantic segmentation [C], 2021 IEEE/CVF Conference on Computer Vi⁃ sion and Pattern Recognition(CVPR), pp. 9711-9720, (2021)
  • [10] HU J, SHEN L, SUN G., Squeeze-and-excitation networks, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7132-7141, (2018)