JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting

被引:1
|
作者
Wang, Mingtao [1 ]
Zhou, Xin [1 ]
Chen, Yuanyuan [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China
关键词
Crowd counting; Count estimation; Multi-scale variations; Multi-density map supervision; PEOPLE; SCALE; MODEL;
D O I
10.1007/s10115-023-02056-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crowd counting based on convolutional neural networks (CNNs) has made significant progress in recent years. However, the limited receptive field of CNNs makes it challenging to capture global features for comprehensive contextual modeling, resulting in insufficient accuracy in count estimation. In comparison, vision transformer (ViT)-based counting networks have demonstrated remarkable performance by exploiting their powerful global contextual modeling capabilities. However, ViT models are associated with higher computational costs and training difficulty. In this paper, we propose a novel network named JMFEEL-Net, which utilizes joint multi-scale feature enhancement and lightweight transformer to improve crowd counting accuracy. Specifically, we use a high-resolution CNN as the backbone network to generate high-resolution feature maps. In the backend network, we propose a multi-scale feature enhancement module to address the problem of low recognition accuracy caused by multi-scale variations, especially when counting small-scale objects in dense scenes. Furthermore, we introduce an improved lightweight ViT encoder to effectively model complex global contexts. We also adopt a multi-density map supervision strategy to learn crowd distribution features from feature maps of different resolutions, thereby improving the quality and training efficiency of the density maps. To validate the effectiveness of the proposed method, we conduct extensive experiments on four challenging datasets, namely ShanghaiTech Part A/B, UCF-QNRF, and JHU-Crowd++, achieving very competitive counting performance.
引用
收藏
页码:3033 / 3053
页数:21
相关论文
共 50 条
  • [1] JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting
    Mingtao Wang
    Xin Zhou
    Yuanyuan Chen
    [J]. Knowledge and Information Systems, 2024, 66 : 3033 - 3053
  • [2] MFP-Net: Multi-scale feature pyramid network for crowd counting
    Lei, Tao
    Zhang, Dong
    Wang, Risheng
    Li, Shuying
    Zhang, Weijiang
    Nandi, Asoke K.
    [J]. IET IMAGE PROCESSING, 2021, 15 (14) : 3522 - 3533
  • [3] Double multi-scale feature fusion network for crowd counting
    Liu, Qian
    Fang, Jiongtao
    Zhong, Yixiong
    Wang, Cunbao
    Qi, Youwei
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 34 (81831-81855)
  • [4] Lightweight multi-scale network with attention for accurate and efficient crowd counting
    Xi, Mengyuan
    Yan, Hua
    [J]. VISUAL COMPUTER, 2024, 40 (06): : 4553 - 4566
  • [5] Multi-scale Feature Aggregation for Crowd Counting
    Jiang, Xiaoheng
    Wu, Xinyi
    Cholakkal, Hisham
    Anwer, Rao Muhammad
    Cao, Jiale
    Xu, Mingliang
    Zhou, Bing
    Pang, Yanwei
    Khan, Fahad Shahbaz
    [J]. arXiv, 2022,
  • [6] Multi-scale dilated convolution of feature Fusion Network for Crowd counting
    Donghua Liu
    Guodong Wang
    Guangtao Zhai
    [J]. Multimedia Tools and Applications, 2022, 81 : 37939 - 37952
  • [7] A multi-scale and multi-level feature aggregation network for crowd counting
    Zhu, Fushun
    Yan, Hua
    Chen, Xinyue
    Li, Tong
    Zhang, Zhengyu
    [J]. NEUROCOMPUTING, 2021, 423 : 46 - 56
  • [8] Multi-scale dilated convolution of feature Fusion Network for Crowd counting
    Liu, Donghua
    Wang, Guodong
    Zhai, Guangtao
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (26) : 37939 - 37952
  • [10] Multi-scale supervised network for crowd counting
    Wang, Yongjie
    Zhang, Wei
    Huang, Dongxiao
    Liu, Yanyan
    Zhu, Jianghua
    [J]. IET IMAGE PROCESSING, 2020, 14 (17) : 4701 - 4707