JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting

被引:2
|
作者
Wang, Mingtao [1 ]
Zhou, Xin [1 ]
Chen, Yuanyuan [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China
关键词
Crowd counting; Count estimation; Multi-scale variations; Multi-density map supervision; PEOPLE; SCALE; MODEL;
D O I
10.1007/s10115-023-02056-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crowd counting based on convolutional neural networks (CNNs) has made significant progress in recent years. However, the limited receptive field of CNNs makes it challenging to capture global features for comprehensive contextual modeling, resulting in insufficient accuracy in count estimation. In comparison, vision transformer (ViT)-based counting networks have demonstrated remarkable performance by exploiting their powerful global contextual modeling capabilities. However, ViT models are associated with higher computational costs and training difficulty. In this paper, we propose a novel network named JMFEEL-Net, which utilizes joint multi-scale feature enhancement and lightweight transformer to improve crowd counting accuracy. Specifically, we use a high-resolution CNN as the backbone network to generate high-resolution feature maps. In the backend network, we propose a multi-scale feature enhancement module to address the problem of low recognition accuracy caused by multi-scale variations, especially when counting small-scale objects in dense scenes. Furthermore, we introduce an improved lightweight ViT encoder to effectively model complex global contexts. We also adopt a multi-density map supervision strategy to learn crowd distribution features from feature maps of different resolutions, thereby improving the quality and training efficiency of the density maps. To validate the effectiveness of the proposed method, we conduct extensive experiments on four challenging datasets, namely ShanghaiTech Part A/B, UCF-QNRF, and JHU-Crowd++, achieving very competitive counting performance.
引用
收藏
页码:3033 / 3053
页数:21
相关论文
共 50 条
  • [41] LMFR-Net: lightweight multi-scale feature refinement network for retinal vessel segmentation
    Zhang, Wenhao
    Qu, Shaojun
    Feng, Yuewen
    PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (02)
  • [42] EMFF-Net: Edge-Enhancement Multi-Scale Feature Fusion Network
    Guan, Xuhui
    Zhou, Jiwang
    Chen, Jian
    Xu, Xiaodan
    Jiang, Yizhang
    Xia, Kaijian
    IEEE ACCESS, 2025, 13 : 25598 - 25611
  • [43] Multi-scale dilated convolution of convolutional neural network for crowd counting
    Yanjie Wang
    Shiyu Hu
    Guodong Wang
    Chenglizhao Chen
    Zhenkuan Pan
    Multimedia Tools and Applications, 2020, 79 : 1057 - 1073
  • [44] Cascade-guided multi-scale attention network for crowd counting
    Shufang Li
    Zhengping Hu
    Mengyao Zhao
    Zhe Sun
    Signal, Image and Video Processing, 2021, 15 : 1663 - 1670
  • [45] MGSNet: A multi-scale and gated spatial attention network for crowd counting
    Ying Shi
    Jun Sang
    Zhongyuan Wu
    Fusen Wang
    Xinyue Liu
    Xiaofeng Xia
    Nong Sang
    Applied Intelligence, 2022, 52 : 15436 - 15446
  • [46] Crowd Counting via Residual Multi-scale Convolutional Neural Network
    Lu, Jingang
    Zhang, Li
    2019 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2019, : 315 - 320
  • [47] MGSNet: A multi-scale and gated spatial attention network for crowd counting
    Shi, Ying
    Sang, Jun
    Wu, Zhongyuan
    Wang, Fusen
    Liu, Xinyue
    Xia, Xiaofeng
    Sang, Nong
    APPLIED INTELLIGENCE, 2022, 52 (13) : 15436 - 15446
  • [48] Cascade-guided multi-scale attention network for crowd counting
    Li, Shufang
    Hu, Zhengping
    Zhao, Mengyao
    Sun, Zhe
    SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (08) : 1663 - 1670
  • [49] MPRNet: Multi-scale Pointwise Regression Network for Crowd Counting and Localization
    Jia, Chenyan
    Cheng, Zhitao
    Leng, Yanlin
    Wang, Junfeng
    Tang, Yong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 180 - 191
  • [50] End to End Multi-Scale Convolutional Neural Network for Crowd Counting
    Ji, Deyi
    Lu, Hongtao
    Zhang, Tongzhen
    ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2018), 2019, 11041