JMFEEL-Net: a joint multi-scale feature enhancement and lightweight transformer network for crowd counting

被引：2

作者：

Wang, Mingtao ^{[1
]}

Zhou, Xin ^{[1
]}

Chen, Yuanyuan ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Sichuan, Peoples R China

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2024年 / 66卷 / 05期

关键词：

Crowd counting; Count estimation; Multi-scale variations; Multi-density map supervision; PEOPLE; SCALE; MODEL;

D O I：

10.1007/s10115-023-02056-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Crowd counting based on convolutional neural networks (CNNs) has made significant progress in recent years. However, the limited receptive field of CNNs makes it challenging to capture global features for comprehensive contextual modeling, resulting in insufficient accuracy in count estimation. In comparison, vision transformer (ViT)-based counting networks have demonstrated remarkable performance by exploiting their powerful global contextual modeling capabilities. However, ViT models are associated with higher computational costs and training difficulty. In this paper, we propose a novel network named JMFEEL-Net, which utilizes joint multi-scale feature enhancement and lightweight transformer to improve crowd counting accuracy. Specifically, we use a high-resolution CNN as the backbone network to generate high-resolution feature maps. In the backend network, we propose a multi-scale feature enhancement module to address the problem of low recognition accuracy caused by multi-scale variations, especially when counting small-scale objects in dense scenes. Furthermore, we introduce an improved lightweight ViT encoder to effectively model complex global contexts. We also adopt a multi-density map supervision strategy to learn crowd distribution features from feature maps of different resolutions, thereby improving the quality and training efficiency of the density maps. To validate the effectiveness of the proposed method, we conduct extensive experiments on four challenging datasets, namely ShanghaiTech Part A/B, UCF-QNRF, and JHU-Crowd++, achieving very competitive counting performance.

引用

页码：3033 / 3053

页数：21

共 50 条

[41] LMFR-Net: lightweight multi-scale feature refinement network for retinal vessel segmentation
Zhang, Wenhao
Qu, Shaojun
Feng, Yuewen
PATTERN ANALYSIS AND APPLICATIONS, 2025, 28 (02)
[42] EMFF-Net: Edge-Enhancement Multi-Scale Feature Fusion Network
Guan, Xuhui
Zhou, Jiwang
Chen, Jian
Xu, Xiaodan
Jiang, Yizhang
Xia, Kaijian
IEEE ACCESS, 2025, 13 : 25598 - 25611
[43] Multi-scale dilated convolution of convolutional neural network for crowd counting
Yanjie Wang
Shiyu Hu
Guodong Wang
Chenglizhao Chen
Zhenkuan Pan
Multimedia Tools and Applications, 2020, 79 : 1057 - 1073
[44] Cascade-guided multi-scale attention network for crowd counting
Shufang Li
Zhengping Hu
Mengyao Zhao
Zhe Sun
Signal, Image and Video Processing, 2021, 15 : 1663 - 1670
[45] MGSNet: A multi-scale and gated spatial attention network for crowd counting
Ying Shi
Jun Sang
Zhongyuan Wu
Fusen Wang
Xinyue Liu
Xiaofeng Xia
Nong Sang
Applied Intelligence, 2022, 52 : 15436 - 15446
[46] Crowd Counting via Residual Multi-scale Convolutional Neural Network
Lu, Jingang
Zhang, Li
2019 SEVENTH INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2019, : 315 - 320
[47] MGSNet: A multi-scale and gated spatial attention network for crowd counting
Shi, Ying
Sang, Jun
Wu, Zhongyuan
Wang, Fusen
Liu, Xinyue
Xia, Xiaofeng
Sang, Nong
APPLIED INTELLIGENCE, 2022, 52 (13) : 15436 - 15446
[48] Cascade-guided multi-scale attention network for crowd counting
Li, Shufang
Hu, Zhengping
Zhao, Mengyao
Sun, Zhe
SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (08) : 1663 - 1670
[49] MPRNet: Multi-scale Pointwise Regression Network for Crowd Counting and Localization
Jia, Chenyan
Cheng, Zhitao
Leng, Yanlin
Wang, Junfeng
Tang, Yong
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 180 - 191
[50] End to End Multi-Scale Convolutional Neural Network for Crowd Counting
Ji, Deyi
Lu, Hongtao
Zhang, Tongzhen
ELEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2018), 2019, 11041

← 1 2 3 4 5 →