Head-Free Lightweight Semantic Segmentation with Linear Transformer

被引:0
|
作者
Dong, Bo [1 ]
Wang, Pichao [1 ]
Wang, Fan [1 ]
机构
[1] Alibaba Grp, Hangzhou, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Existing semantic segmentation works have been mainly focused on designing effective decoders; however, the com-putational load introduced by the overall structure has long been ignored, which hinders their applications on resource-constrained hardwares. In this paper, we propose a head-free lightweight architecture specifically for semantic segmentation, named Adaptive Frequency Transformer (AFFormer). AFFormer adopts a parallel architecture to leverage prototype representations as specific learnable local descriptions which replaces the decoder and preserves the rich image semantics on high-resolution features. Although removing the decoder compresses most of the computation, the accuracy of the parallel structure is still limited by low computational resources. Therefore, we employ heterogeneous operators (CNN and Vision Transformer) for pixel embedding and prototype representations to further save computational costs. Moreover, it is very difficult to linearize the complexity of the vision Transformer from the perspective of spatial domain. Due to the fact that semantic segmentation is very sensitive to frequency information, we construct a lightweight prototype learning block with adaptive frequency filter of complexity O(n) to replace standard self attention with O(n2). Extensive experiments on widely adopted datasets demonstrate that AFFormer achieves superior accuracy while retaining only 3M parameters. On the ADE20K dataset, AFFormer achieves 41.8 mIoU and 4.6 GFLOPs, which is 4.4 mIoU higher than Segformer, with 45% less GFLOPs. On the Cityscapes dataset, AFFormer achieves 78.7 mIoU and 34.4 GFLOPs, which is 2.5 mIoU higher than Segformer with 72.5% less GFLOPs. Code is available at https://github.com/dongbo811/AFFormer.
引用
收藏
页码:516 / 524
页数:9
相关论文
共 50 条
  • [1] SSformer: A Lightweight Transformer for Semantic Segmentation
    Shi, Wentao
    Xu, Jing
    Gao, Pan
    2022 IEEE 24TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2022,
  • [2] A lightweight siamese transformer for few-shot semantic segmentation
    Zhu, Hegui
    Zhou, Yange
    Jiang, Cong
    Yang, Lianping
    Jiang, Wuming
    Wang, Zhimu
    NEURAL COMPUTING & APPLICATIONS, 2024, 36 (13): : 7455 - 7469
  • [3] A lightweight siamese transformer for few-shot semantic segmentation
    Hegui Zhu
    Yange Zhou
    Cong Jiang
    Lianping Yang
    Wuming Jiang
    Zhimu Wang
    Neural Computing and Applications, 2024, 36 : 7455 - 7469
  • [4] LSTFormer:Lightweight Semantic Segmentation Network Based on Swin Transformer
    Yang, Cheng
    Gao, Jianlin
    Zheng, Meilin
    Ding, Rong
    Computer Engineering and Applications, 2023, 59 (12) : 166 - 175
  • [5] GAZE STRATEGIES DURING LINEAR MOTION IN HEAD-FREE HUMANS
    BOREL, L
    LEGOFF, B
    CHARADE, O
    BERTHOZ, A
    JOURNAL OF NEUROPHYSIOLOGY, 1994, 72 (05) : 2451 - 2466
  • [6] Effect of a visual cue to head position on head-free pursuit
    Luthman, Nick
    Fogt, Nick
    Optometry and Vision Science, 2000, 77 (12 SUPPL.)
  • [7] Lightweight Real-Time Semantic Segmentation Network With Efficient Transformer and CNN
    Xu, Guoan
    Li, Juncheng
    Gao, Guangwei
    Lu, Huimin
    Yang, Jian
    Yue, Dong
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (12) : 15897 - 15906
  • [8] Head movement control during head-free gaze shifts
    Lehnen, Nadine
    Buettner, Ulrich
    Glasauer, Stefan
    USING EYE MOVEMENTS AS AN EXPERIMENTAL PROBE OF BRAIN FUNCTION - A SYMPOSIUM IN HONOR OF JEAN BUTTNER-ENNEVER, 2008, 171 : 331 - 334
  • [9] Orienting head movements during head-free pursuit
    Satgunam, P
    Fogt, N
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2003, 44 : U359 - U359
  • [10] CLIP for Lightweight Semantic Segmentation
    Jin, Ke
    Yang, Wankou
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 323 - 333