An interactive network based on transformer for multimodal crowd counting

被引：0

作者：

Ying Yu

Zhen Cai

Duoqian Miao

Jin Qian

Hong Tang

机构：

[1] College of Software Engineering,

[2] Department of Computer Science and Technology,undefined

来源：

Applied Intelligence | 2023年 / 53卷

关键词：

Crowd counting; Transformer; Multimodal data; Feature fusion;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Crowd counting is a task to estimate the total number of pedestrians in an image. In most of the existing research, good vision problems, such as in parks, squares, and bright shopping malls during the day, have been addressed. However, there is little research on complex scenes in darkness. To study this problem, we propose an interactive network based on Transformer for multi-modal crowd counting. First, sliding convolutional encoding is adopted for the image to obtain better encoding features. The features are extracted through the designed primary interaction network, and then channel token attention is used to modulate the features. Then, the FGAF-MLP is used for high and low semantic fusion to enhance the feature expression and fully fuse the data in different modes to improve the accuracy of the method. To verify the effectiveness of our method, we conducted extensive ablation experiments with the latest multimodal benchmark RGBT-CC, and we verified the complementarity between multiple modal data and the effectiveness of the model components. We also verified the effectiveness of our method with the ShanghaiTechRGBD benchmark. The experimental results showed that our proposed method exhibits good results and achieves an improvement of more than 10%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} in terms of the mean average error and mean squared error for the RGBT-CC benchmark.

引用

页码：22602 / 22614

页数：12

共 50 条

[1] An interactive network based on transformer for multimodal crowd counting
Yu, Ying
Cai, Zhen
Miao, Duoqian
Qian, Jin
Tang, Hong
APPLIED INTELLIGENCE, 2023, 53 (19) : 22602 - 22614
[2] Transformer-Based Feature Aggregation and Stitching Network for Crowd Counting
Wang, Kehao
Wang, Yuhui
Ren, Ruiqi
Zou, Han
Shao, Zhichao
IEEE ACCESS, 2023, 11 : 124833 - 124844
[3] Transformer-CNN hybrid network for crowd counting
Yu J.
Yu Y.
Qian J.
Han X.
Zhu F.
Zhu Z.
Journal of Intelligent and Fuzzy Systems, 2024, 46 (04): : 10773 - 10785
[4] Weakly supervised crowd counting based on Swin Transformer
Feng, Min
Hao, Linlin
Kuang, Yonggang
2023 THE 6TH INTERNATIONAL CONFERENCE ON ROBOT SYSTEMS AND APPLICATIONS, ICRSA 2023, 2023, : 229 - 236
[5] Audio-Visual Transformer Based Crowd Counting
Sajid, Usman
Chen, Xiangyu
Sajid, Hasan
Kim, Taejoon
Wang, Guanghui
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 2249 - 2259
[6] Application of improved transformer based on weakly supervised in crowd localization and crowd counting
Hui Gao
Wenjun Zhao
Dexian Zhang
Miaolei Deng
Scientific Reports, 13
[7] Application of improved transformer based on weakly supervised in crowd localization and crowd counting
Gao, Hui
Zhao, Wenjun
Zhang, Dexian
Deng, Miaolei
SCIENTIFIC REPORTS, 2023, 13 (01)
[8] LOCALITY-CONSTRAINED SPATIAL TRANSFORMER NETWORK FOR VIDEO CROWD COUNTING
Fang, Yanyan
Zhan, Biyun
Cai, Wandi
Gao, Shenghua
Hu, Bo
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 814 - 819
[9] Crowd counting in congested scene by CNN and Transformer Crowd counting for converged networks
Lin, Yuanyuan
Yang, Huicheng
Hu, Yaocong
Shuai, Zhen
Li, Wenting
PROCEEDINGS OF 2023 7TH INTERNATIONAL CONFERENCE ON ELECTRONIC INFORMATION TECHNOLOGY AND COMPUTER ENGINEERING, EITCE 2023, 2023, : 1092 - 1095
[10] CCST: crowd counting with swin transformer
Bo Li
Yong Zhang
Haihui Xu
Baocai Yin
The Visual Computer, 2023, 39 : 2671 - 2682

← 1 2 3 4 5 →