An interactive network based on transformer for multimodal crowd counting

被引:0
|
作者
Ying Yu
Zhen Cai
Duoqian Miao
Jin Qian
Hong Tang
机构
[1] College of Software Engineering,
[2] Department of Computer Science and Technology,undefined
来源
Applied Intelligence | 2023年 / 53卷
关键词
Crowd counting; Transformer; Multimodal data; Feature fusion;
D O I
暂无
中图分类号
学科分类号
摘要
Crowd counting is a task to estimate the total number of pedestrians in an image. In most of the existing research, good vision problems, such as in parks, squares, and bright shopping malls during the day, have been addressed. However, there is little research on complex scenes in darkness. To study this problem, we propose an interactive network based on Transformer for multi-modal crowd counting. First, sliding convolutional encoding is adopted for the image to obtain better encoding features. The features are extracted through the designed primary interaction network, and then channel token attention is used to modulate the features. Then, the FGAF-MLP is used for high and low semantic fusion to enhance the feature expression and fully fuse the data in different modes to improve the accuracy of the method. To verify the effectiveness of our method, we conducted extensive ablation experiments with the latest multimodal benchmark RGBT-CC, and we verified the complementarity between multiple modal data and the effectiveness of the model components. We also verified the effectiveness of our method with the ShanghaiTechRGBD benchmark. The experimental results showed that our proposed method exhibits good results and achieves an improvement of more than 10%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} in terms of the mean average error and mean squared error for the RGBT-CC benchmark.
引用
收藏
页码:22602 / 22614
页数:12
相关论文
共 50 条
  • [1] An interactive network based on transformer for multimodal crowd counting
    Yu, Ying
    Cai, Zhen
    Miao, Duoqian
    Qian, Jin
    Tang, Hong
    APPLIED INTELLIGENCE, 2023, 53 (19) : 22602 - 22614
  • [2] Transformer-Based Feature Aggregation and Stitching Network for Crowd Counting
    Wang, Kehao
    Wang, Yuhui
    Ren, Ruiqi
    Zou, Han
    Shao, Zhichao
    IEEE ACCESS, 2023, 11 : 124833 - 124844
  • [3] Transformer-CNN hybrid network for crowd counting
    Yu J.
    Yu Y.
    Qian J.
    Han X.
    Zhu F.
    Zhu Z.
    Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 10773 - 10785
  • [4] Weakly supervised crowd counting based on Swin Transformer
    Feng, Min
    Hao, Linlin
    Kuang, Yonggang
    2023 THE 6TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA 2023, 2023, : 229 - 236
  • [5] Audio-Visual Transformer Based Crowd Counting
    Sajid, Usman
    Chen, Xiangyu
    Sajid, Hasan
    Kim, Taejoon
    Wang, Guanghui
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2249 - 2259
  • [6] Application of improved transformer based on weakly supervised in crowd localization and crowd counting
    Hui Gao
    Wenjun Zhao
    Dexian Zhang
    Miaolei Deng
    Scientific Reports, 13
  • [7] Application of improved transformer based on weakly supervised in crowd localization and crowd counting
    Gao, Hui
    Zhao, Wenjun
    Zhang, Dexian
    Deng, Miaolei
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [8] LOCALITY-CONSTRAINED SPATIAL TRANSFORMER NETWORK FOR VIDEO CROWD COUNTING
    Fang, Yanyan
    Zhan, Biyun
    Cai, Wandi
    Gao, Shenghua
    Hu, Bo
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 814 - 819
  • [9] Crowd counting in congested scene by CNN and Transformer Crowd counting for converged networks
    Lin, Yuanyuan
    Yang, Huicheng
    Hu, Yaocong
    Shuai, Zhen
    Li, Wenting
    PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, : 1092 - 1095
  • [10] CCST: crowd counting with swin transformer
    Bo Li
    Yong Zhang
    Haihui Xu
    Baocai Yin
    The Visual Computer, 2023, 39 : 2671 - 2682