Efficient pyramid context encoding and feature embedding for semantic segmentation

被引:9
|
作者
Liu, Mengyu [1 ]
Yin, Hujun [1 ]
机构
[1] Univ Manchester, Dept Elect & Elect Engn, Manchester, Lancs, England
关键词
Semantic segmentation; Convolutional neural networks; Pyramid context encoding; Real-time processing;
D O I
10.1016/j.imavis.2021.104195
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For reality applications of semantic segmentation, inference speed and memory usage are two important factors. To address these challenges, we propose a lightweight feature pyramid encoding network (FPENet) for semantic segmentation with a good trade-off between accuracy and speed. We use a series of feature pyramid encoding (FPE) blocks to encode context at multiple scales in the encoder. Each FPE block consists of different depthwise dilated convolutions that perform as a spatial pyramid to extract features and reduce computational costs. During training, a one-shot neural architecture search algorithm is adopted to find the optimal structure for each FPE block from a large search space with a small search cost. After the search for the encoder, a mutual embedding upsample module is introduced in the decoder, consisting of two attention blocks. The encoder-decoder attention mechanism is used to help aggregate efficiently high-level semantic features and low-level spatial details. The proposed network outperforms the existing real-time methods with fewer parameters and improved inference speed on the Cityscapes and CamVid benchmark datasets. Specifically, it achieved 72.3% mean IoU on the Cityscapes test set with only 0.4 M parameters and 192.6 FPS speed on an Nvidia Titan V100 GPU, and 73.4% mean IoU with 116.2 FPS when running on higher resolution images. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Pyramid Context Contrast for Semantic Segmentation
    Chen, Yuzhong
    Lin, Yangyang
    Niu, Yuzhen
    Ke, Xiao
    Huang, tengda
    IEEE ACCESS, 2019, 7 : 173679 - 173693
  • [2] Context Encoding for Semantic Segmentation
    Zhang, Hang
    Dana, Kristin
    Shi, Jianping
    Zhang, Zhongyue
    Wang, Xiaogang
    Tyagi, Ambrish
    Agrawal, Amit
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7151 - 7160
  • [3] Enhanced Feature Pyramid Network for Semantic Segmentation
    Ye, Mucong
    Ouyang, Jingpeng
    Chen, Ge
    Zhang, Jing
    Yu, Xiaogang
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3209 - 3216
  • [4] Adaptive Pyramid Context Network for Semantic Segmentation
    He, Junjun
    Deng, Zhongying
    Zhou, Lei
    Wang, Yali
    Qiao, Yu
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7511 - 7520
  • [5] PYRAMID-CONTEXT GUIDED FEATURE FUSION FOR RGB-D SEMANTIC SEGMENTATION
    Liu, Haoming
    Guo, Li
    Zhou, Zhongwen
    Zhang, Hanyuan
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [6] A Unified Efficient Pyramid Transformer for Semantic Segmentation
    Zhu, Fangrui
    Zhu, Yi
    Zhang, Li
    Wu, Chongruo
    Fu, Yanwei
    Li, Mu
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2667 - 2677
  • [7] Efficient Attention Pyramid Network for Semantic Segmentation
    Yang, Qirui
    Ku, Tao
    Hu, Kunyuan
    IEEE ACCESS, 2021, 9 : 18867 - 18875
  • [8] Enhanced-feature pyramid network for semantic segmentation
    Quyen, Van Toan
    Lee, Jong Hyuk
    Kim, Min Young
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 782 - 787
  • [9] Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
    Ni, Zhenliang
    Chen, Xinghao
    Zhai, Yingjie
    Tang, Yehui
    Wang, Yunhe
    COMPUTER VISION - ECCV 2024, PT LII, 2025, 15110 : 239 - 255
  • [10] Volumetric Semantic Segmentation using Pyramid Context Features
    Barron, Jonathan T.
    Arbelaez, Pablo
    Keraenen, Soile V. E.
    Biggin, Mark D.
    Knowles, David W.
    Malik, Jitendra
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3448 - 3455