Crowd Counting and Density Estimation by Trellis Encoder-Decoder Networks

被引:231
|
作者
Jiang, Xiaolong [1 ]
Xiao, Zehao [1 ]
Zhang, Baochang [3 ]
Zhen, Xiantong [4 ]
Cao, Xianbin [1 ,2 ]
Doermann, David [5 ]
Shao, Ling [4 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China
[2] Beihang Univ, Minist Ind & Informat Technol China, Key Lab Adv Technol Near Space Informat Syst, Beijing, Peoples R China
[3] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing, Peoples R China
[4] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
[5] SUNY Buffalo, Dept Comp Sci & Engn, Buffalo, NY USA
关键词
D O I
10.1109/CVPR.2019.00629
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crowd counting has recently attracted increasing interest in computer vision but remains a challenging problem. In this paper, we propose a trellis encoder-decoder network (TEDnet) for crowd counting, which focuses on generating high-quality density estimation maps. The major contributions are four-fold. First, we develop a new trellis architecture that incorporates multiple decoding paths to hierarchically aggregate features at different encoding stages, which improves the representative capability of convolutional features for large variations in objects. Second, we employ dense skip connections interleaved across paths to facilitate sufficient multi-scale feature fusions, which also helps TEDnet to absorb the supervision information. Third, we propose a new combinatorial loss to enforce similarities in local coherence and spatial correlation between maps. By distributedly imposing this combinatorial loss on intermediate outputs, TEDnet can improve the back-propagation process and alleviate the gradient vanishing problem. Finally, on four widely-used benchmarks, our TEDnet achieves the best overall performance in terms of both density map quality and counting accuracy, with an improvement up to 14% in MAE metric. These results validate the effectiveness of TEDnet for crowd counting.
引用
收藏
页码:6126 / 6135
页数:10
相关论文
共 50 条
  • [1] Attentive encoder-decoder networks for crowd counting
    Liu, Xuhui
    Hu, Yutao
    Zhang, Baochang
    Zhen, Xiantong
    Luo, Xiaoyan
    Cao, Xianbin
    [J]. NEUROCOMPUTING, 2022, 490 : 246 - 257
  • [2] SENetCount: An Optimized Encoder-Decoder Architecture with Squeeze-and-Excitation for Crowd Counting
    Meng, Xiaolong
    [J]. WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2022, 2022
  • [3] Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting
    Thanasutives, Pongpisit
    Fukui, Ken-ichi
    Numao, Masayuki
    Kijsirikul, Boonserm
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2382 - 2389
  • [4] Multi-scale Supervised Attentive Encoder-Decoder Network for Crowd Counting
    Zhang, Anran
    Jiang, Xiaolong
    Zhang, Baochang
    Cao, Xianbin
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (01)
  • [5] MobileCount: An efficient encoder-decoder framework for real-time crowd counting
    Wang, Peng
    Gao, Chenyu
    Wang, Yang
    Li, Hui
    Gao, Ye
    [J]. NEUROCOMPUTING, 2020, 407 : 292 - 299
  • [6] Counting in congested crowd scenes with hierarchical scale-aware encoder-decoder network
    Han, Run
    Qi, Ran
    Lu, Xuequan
    Huang, Lei
    Lyu, Lei
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2024, 238
  • [7] Semantic Translation with Convolutional Encoder-decoder Networks for Viewpoint Estimation
    Zhang, Liangjun
    Gu, Changjian
    Gu, Chaochen
    Wu, Kaijie
    Guan, Xinping
    [J]. 2017 11TH ASIAN CONTROL CONFERENCE (ASCC), 2017, : 1660 - 1665
  • [8] Encoder-decoder with densely convolutional networks for monocular depth estimation
    Chen, Songnan
    Tang, Mengxia
    Kan, Jiangming
    [J]. JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2019, 36 (10) : 1709 - 1718
  • [9] An Effective Lightweight Crowd Counting Method Based on an Encoder-Decoder Network for Internet of Video Things
    Yi, Jun
    Chen, Fan
    Shen, Zhilong
    Xiang, Yi
    Xiao, Shan
    Zhou, Wei
    [J]. IEEE INTERNET OF THINGS JOURNAL, 2024, 11 (02): : 3082 - 3094
  • [10] Interpretable Transformations with Encoder-Decoder Networks
    Worrall, Daniel E.
    Garbin, Stephan J.
    Turmukhambetov, Daniyar
    Brostow, Gabriel J.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5737 - 5746