Multi-scale Supervised Attentive Encoder-Decoder Network for Crowd Counting

被引：8

作者：

Zhang, Anran ^{[1
]}

Jiang, Xiaolong ^{[1
]}

Zhang, Baochang ^{[2
]}

Cao, Xianbin ^{[1
,3
,4
]}

机构：

[1] Beihang Univ, Sch Elect & Informat Engn, XueYuan Rd 37, Beijing, Peoples R China

[2] Beihang Univ, Sch Automat Sci & Elect Engn, XueYuan Rd 37, Beijing, Peoples R China

[3] Beihang Univ, Key Lab Adv Technol Near Space Informat Syst, Minist Ind & Informat Technol China, XueYuan Rd 37, Beijing, Peoples R China

[4] Beijing Adv Innovat Ctr Big Data Based Precis Med, XueYuan Rd 37, Beijing, Peoples R China

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2020年 / 16卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Self-attention; representation fusion; supervised method; HUMANS; SEGMENTATION; TRACKING; MULTIPLE; IMAGE;

D O I：

10.1145/3356019

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Crowd counting is a popular topic with widespread applications. Currently, the biggest challenge to crowd counting is large-scale variation in objects. In this article, we focus on overcoming this challenge by proposing a novel Attentive Encoder-Decoder Network (AEDN), which is supervised on multiple feature scales to conduct crowd counting via density estimation. This work has three main contributions. First, we augment the traditional encoder-decoder architecture with our proposed residual attention blocks, which, beyond skip-connected encoded features, further extend the decoded features with attentive features. AEDN is better at establishing long-range dependencies between the encoder and decoder, therefore promoting more effective fusion of multi-scale features for handling scale-variations. Second, we design a new KL-divergence-based distribution loss to supervise the scale-aware structural differences between two density maps, which complements the pixel-isolated MSE loss and better optimizes AEDN to generate high-quality density maps. Third, we adopt a multi-scale supervision scheme, such that multiple KL divergences and MSE losses are deployed at all decoding stages, providing more thorough supervisions for different feature scales. Extensive experimental results on four public datasets, including ShanghaiTech Part A, ShanghaiTech Part B, UCF-CC-50, and UCF-QNRF, reveal the superiority and efficacy of the proposed method, which outperforms most state-of-the-art competitors.

引用

页数：20

共 50 条

[1] Attentive encoder-decoder networks for crowd counting
Liu, Xuhui
Hu, Yutao
Zhang, Baochang
Zhen, Xiantong
Luo, Xiaoyan
Cao, Xianbin
[J]. NEUROCOMPUTING, 2022, 490 : 246 - 257
[2] Multi-scale supervised network for crowd counting
Wang, Yongjie
Zhang, Wei
Huang, Dongxiao
Liu, Yanyan
Zhu, Jianghua
[J]. IET IMAGE PROCESSING, 2020, 14 (17) : 4701 - 4707
[3] Counting in congested crowd scenes with hierarchical scale-aware encoder-decoder network
Han, Run
Qi, Ran
Lu, Xuequan
Huang, Lei
Lyu, Lei
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
[4] Multi-scale Recurrent Encoder-Decoder Network for Dense Temporal Classification
Choo, Sungkwon
Seo, Wonkyo
Jeong, Dong-Ju
Cho, Nam Ik
[J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 103 - 108
[5] Multi-scale deep encoder-decoder network for salient object detection
Ren, Qinghua
Hu, Renjie
[J]. NEUROCOMPUTING, 2018, 316 : 95 - 104
[6] HYPERSPECTRAL IMAGE CLASSIFICATION VIA MULTI-SCALE ENCODER-DECODER NETWORK
Ma, Jingjing
Wu, Linlin
Tang, Xu
Zhang, Xiangrong
Zhu, Cheng
Ma, Junyong
Jiao, Licheng
[J]. IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2020, : 1283 - 1286
[7] Building Extraction of Aerial Images by a Global and Multi-Scale Encoder-Decoder Network
Ma, Jingjing
Wu, Linlin
Tang, Xu
Liu, Fang
Zhang, Xiangrong
Jiao, Licheng
[J]. REMOTE SENSING, 2020, 12 (15)
[8] Multi-Scale Attention and Encoder-Decoder Network for Video Saliency Object Detection
Hongbo Bi
Huihui Zhu
Lina Yang
Ranwan Wu
[J]. Pattern Recognition and Image Analysis, 2022, 32 : 340 - 350
[9] Multi-Scale Attention and Encoder-Decoder Network for Video Saliency Object Detection
Bi, Hongbo
Zhu, Huihui
Yang, Lina
Wu, Ranwan
[J]. PATTERN RECOGNITION AND IMAGE ANALYSIS, 2022, 32 (02) : 340 - 350
[10] Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting
Thanasutives, Pongpisit
Fukui, Ken-ichi
Numao, Masayuki
Kijsirikul, Boonserm
[J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2382 - 2389

← 1 2 3 4 5 →