A hierarchical Transformer network for smoke video recognition

被引:0
|
作者
Cheng, Guangtao [1 ]
Xian, Baoyi [1 ]
Liu, Yifan [1 ]
Chen, Xue [2 ]
Hu, Lianjun [1 ]
Song, Zhanjie [3 ]
机构
[1] Tianjin Univ Commerce, Sch Informat Engn, Tianjin, Peoples R China
[2] Tianjin Univ, Law Sch, Tianjin, Peoples R China
[3] Tianjin Univ, Sch Math, Tianjin, Peoples R China
关键词
Smoke recognition; Deep learning; Transformer; Fire detection; FIRE DETECTION; DETECTION ALGORITHM; COLOR; MOTION; IMAGE; MODEL;
D O I
10.1016/j.dsp.2024.104959
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
During fire incidents, the quick and accurate identification of smoke is crucial for issuing early warnings and reducing the risk of fire. This paper proposes an accurate efficient smoke video recognition network based on a novel hierarchical Transformer architecture. We design the SoftPool-based multi-head self-attention (SMHSA) module, which performs self-attention operations on shortened sequences. This approach facilitates the extraction of global features across various smoke patterns while reducing computational complexity and preserving essential feature information. Our hierarchical network architecture integrates SMHSA modules progressively, enhancing the modeling of global dependencies among image patches of different scales. Specifically, shallower layers are dedicated to analyzing small-scale patches, while deeper layers focus on larger-scale patches. This structure optimizes the model's ability to capture multi-scale information, which is critical for accurate smoke recognition in video sequences. Additionally, the self-attention mechanism is implemented on sequences of progressively decreasing lengths, leading to a significant reduction in computational complexity. To support thorough evaluation and advancement in this field, we have created a dedicated smoke video recognition dataset (SVRD) that includes a wide range of scenarios and smoke patterns. Using the SVRD, we conducted extensive experiments to validate the effectiveness of our approach. Our findings clearly demonstrate that the proposed network achieves superior accuracy in smoke recognition while maintaining significantly lower computational costs compared to existing methodologies.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Convolution-Enhanced Vision Transformer Network for Smoke Recognition
    Cheng, Guangtao
    Zhou, Yancong
    Gao, Shan
    Li, Yingyu
    Yu, Hao
    FIRE TECHNOLOGY, 2023, 59 (02) : 925 - 948
  • [2] Convolution-Enhanced Vision Transformer Network for Smoke Recognition
    Guangtao Cheng
    Yancong Zhou
    Shan Gao
    Yingyu Li
    Hao Yu
    Fire Technology, 2023, 59 : 925 - 948
  • [3] Sparse Dense Transformer Network for Video Action Recognition
    Qu, Xiaochun
    Zhang, Zheyuan
    Xiao, Wei
    Ran, Jinye
    Wang, Guodong
    Zhang, Zili
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT II, 2022, 13369 : 43 - 56
  • [4] Second-order transformer network for video recognition
    Zhang, Bingbing
    Dong, Wei
    Wang, Zhenwei
    Zhang, Jianxin
    Sun, Qiule
    ALEXANDRIA ENGINEERING JOURNAL, 2025, 114 : 82 - 94
  • [5] Hierarchical temporal transformer network for tool wear state recognition
    Xue, Zhongling
    Chen, Ni
    Wu, Youling
    Yang, Yinfei
    Li, Liang
    ADVANCED ENGINEERING INFORMATICS, 2023, 58
  • [6] HiTRANS: A Hierarchical Transformer Network for Nested Named Entity Recognition
    Yang, Zhiwei
    Ma, Jing
    Chen, Hechang
    Zhang, Yunke
    Chang, Yi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 124 - 132
  • [7] Hierarchical Transformer Network for Utterance-Level Emotion Recognition
    Li, Qingbiao
    Wu, Chunhua
    Wang, Zhe
    Zheng, Kangfeng
    APPLIED SCIENCES-BASEL, 2020, 10 (13):
  • [8] TWO-PATHWAY TRANSFORMER NETWORK FOR VIDEO ACTION RECOGNITION
    Jiang, Bo
    Yu, Jiahong
    Zhou, Lei
    Wu, Kailin
    Yang, Yang
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1089 - 1093
  • [9] Video Transformer Network
    Neimark, Daniel
    Bar, Omri
    Zohar, Maya
    Asselmann, Dotan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 3156 - 3165
  • [10] Hierarchical Motion Excitation Network for Few-Shot Video Recognition
    Wang, Bing
    Wang, Xiaohua
    Ren, Shiwei
    Wang, Weijiang
    Shi, Yueting
    ELECTRONICS, 2023, 12 (05)