A hierarchical Transformer network for smoke video recognition

被引:0
|
作者
Cheng, Guangtao [1 ]
Xian, Baoyi [1 ]
Liu, Yifan [1 ]
Chen, Xue [2 ]
Hu, Lianjun [1 ]
Song, Zhanjie [3 ]
机构
[1] Tianjin Univ Commerce, Sch Informat Engn, Tianjin, Peoples R China
[2] Tianjin Univ, Law Sch, Tianjin, Peoples R China
[3] Tianjin Univ, Sch Math, Tianjin, Peoples R China
关键词
Smoke recognition; Deep learning; Transformer; Fire detection; FIRE DETECTION; DETECTION ALGORITHM; COLOR; MOTION; IMAGE; MODEL;
D O I
10.1016/j.dsp.2024.104959
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
During fire incidents, the quick and accurate identification of smoke is crucial for issuing early warnings and reducing the risk of fire. This paper proposes an accurate efficient smoke video recognition network based on a novel hierarchical Transformer architecture. We design the SoftPool-based multi-head self-attention (SMHSA) module, which performs self-attention operations on shortened sequences. This approach facilitates the extraction of global features across various smoke patterns while reducing computational complexity and preserving essential feature information. Our hierarchical network architecture integrates SMHSA modules progressively, enhancing the modeling of global dependencies among image patches of different scales. Specifically, shallower layers are dedicated to analyzing small-scale patches, while deeper layers focus on larger-scale patches. This structure optimizes the model's ability to capture multi-scale information, which is critical for accurate smoke recognition in video sequences. Additionally, the self-attention mechanism is implemented on sequences of progressively decreasing lengths, leading to a significant reduction in computational complexity. To support thorough evaluation and advancement in this field, we have created a dedicated smoke video recognition dataset (SVRD) that includes a wide range of scenarios and smoke patterns. Using the SVRD, we conducted extensive experiments to validate the effectiveness of our approach. Our findings clearly demonstrate that the proposed network achieves superior accuracy in smoke recognition while maintaining significantly lower computational costs compared to existing methodologies.
引用
收藏
页数:15
相关论文
共 50 条
  • [41] Memory-enhanced hierarchical transformer for video paragraph captioning
    Zhang, Benhui
    Gao, Junyu
    Yuan, Yuan
    NEUROCOMPUTING, 2025, 615
  • [42] TRCDNet: A Transformer Network for Video Cloud Detection
    Luo, Chen
    Feng, Shanshan
    Quan, Yingling
    Ye, Yunming
    Li, Xutao
    Xu, Yong
    Zhang, Baoquan
    Chen, Zhihao
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [43] Video Enhancement Network Based on CNN and Transformer
    YUAN Lang
    HUI Chen
    WU Yanfeng
    LIAO Ronghua
    JIANG Feng
    GAO Ying
    ZTE Communications, 2024, 22 (04) : 78 - 88
  • [44] LoViT: Long Video Transformer for surgical phase recognition
    Liu, Yang
    Boels, Maxence
    Garcia-Peraza-Herrera, Luis C.
    Vercauteren, Tom
    Dasgupta, Prokar
    Granados, Alejandro
    Ourselin, Sebastien
    MEDICAL IMAGE ANALYSIS, 2025, 99
  • [45] WLiT: Windows and Linear Transformer for Video Action Recognition
    Sun, Ruoxi
    Zhang, Tianzhao
    Wan, Yong
    Zhang, Fuping
    Wei, Jianming
    SENSORS, 2023, 23 (03)
  • [46] Multiscaled Multi-Head Attention-Based Video Transformer Network for Hand Gesture Recognition
    Garg, Mallika
    Ghosh, Debashis
    Pradhan, Pyari Mohan
    IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 80 - 84
  • [47] Zero-shot Video Emotion Recognition via Multimodal Protagonist-aware Transformer Network
    Qi, Fan
    Yang, Xiaoshan
    Xu, Changsheng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1074 - 1083
  • [48] Word recognition with a hierarchical neural network
    Domont, Xavier
    Heckmann, Martin
    Wersing, Heiko
    Joublin, Frank
    Menzel, Stefan
    Sendhoff, Bernhard
    Goerick, Christian
    ADVANCES IN NONLINEAR SPEECH PROCESSING, 2007, 4885 : 142 - 151
  • [49] The Hierarchical Brain Network for Face Recognition
    Zhen, Zonglei
    Fang, Huizhen
    Liu, Jia
    PLOS ONE, 2013, 8 (03):
  • [50] Smoke recognition network based on dynamic characteristics
    Wang, Dahan
    Luo, Sheng
    Zhao, Li
    Pan, Xiaoming
    Wang, Muchou
    Zhu, Shunzhi
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2020, 17 (03)