A hierarchical Transformer network for smoke video recognition

被引:0
|
作者
Cheng, Guangtao [1 ]
Xian, Baoyi [1 ]
Liu, Yifan [1 ]
Chen, Xue [2 ]
Hu, Lianjun [1 ]
Song, Zhanjie [3 ]
机构
[1] Tianjin Univ Commerce, Sch Informat Engn, Tianjin, Peoples R China
[2] Tianjin Univ, Law Sch, Tianjin, Peoples R China
[3] Tianjin Univ, Sch Math, Tianjin, Peoples R China
关键词
Smoke recognition; Deep learning; Transformer; Fire detection; FIRE DETECTION; DETECTION ALGORITHM; COLOR; MOTION; IMAGE; MODEL;
D O I
10.1016/j.dsp.2024.104959
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
During fire incidents, the quick and accurate identification of smoke is crucial for issuing early warnings and reducing the risk of fire. This paper proposes an accurate efficient smoke video recognition network based on a novel hierarchical Transformer architecture. We design the SoftPool-based multi-head self-attention (SMHSA) module, which performs self-attention operations on shortened sequences. This approach facilitates the extraction of global features across various smoke patterns while reducing computational complexity and preserving essential feature information. Our hierarchical network architecture integrates SMHSA modules progressively, enhancing the modeling of global dependencies among image patches of different scales. Specifically, shallower layers are dedicated to analyzing small-scale patches, while deeper layers focus on larger-scale patches. This structure optimizes the model's ability to capture multi-scale information, which is critical for accurate smoke recognition in video sequences. Additionally, the self-attention mechanism is implemented on sequences of progressively decreasing lengths, leading to a significant reduction in computational complexity. To support thorough evaluation and advancement in this field, we have created a dedicated smoke video recognition dataset (SVRD) that includes a wide range of scenarios and smoke patterns. Using the SVRD, we conducted extensive experiments to validate the effectiveness of our approach. Our findings clearly demonstrate that the proposed network achieves superior accuracy in smoke recognition while maintaining significantly lower computational costs compared to existing methodologies.
引用
收藏
页数:15
相关论文
共 50 条
  • [21] Hierarchical Attention-Based Multimodal Fusion Network for Video Emotion Recognition
    Liu, Xiaodong
    Li, Songyang
    Wang, Miao
    COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2021, 2021
  • [22] A Knowledge-Based Hierarchical Causal Inference Network for Video Action Recognition
    Liu, Yang
    Liu, Fang
    Jiao, Licheng
    Bao, Qianyue
    Li, Lingling
    Guo, Yuwei
    Chen, Puhua
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 9135 - 9149
  • [23] A Transformer Network for CAPTCHA Recognition
    Shi, Yuliang
    Liu, Xin
    Han, Song
    Lu, Yingguang
    Zhang, Xiangdong
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,
  • [24] Video smoke recognition based on optical flow
    Yu Chunyu
    Zhang Yongming
    Fang Jun
    Wang Jinjun
    2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER CONTROL (ICACC 2010), VOL. 2, 2010, : 16 - 21
  • [25] Hierarchical Separable Video Transformer for Snapshot Compressive Imaging
    Wang, Ping
    Zhang, Yulun
    Wang, Lishun
    Yuan, Xin
    COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 104 - 122
  • [26] HFA-GTNet: Hierarchical Fusion Adaptive Graph Transformer network for dance action recognition
    Jia, Ru
    Zhao, Li
    Yang, Rui
    Yang, Honghong
    Wu, Xiaojun
    Zhang, Yumei
    Li, Peng
    Su, Yuping
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 98
  • [27] HMTN: Hierarchical Multi-scale Transformer Network for 3D Shape Recognition
    Zhao, Yue
    Nie, Weizhi
    Gao, Zan
    Liu, An-an
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022,
  • [28] OCTFormer: An Efficient Hierarchical Transformer Network Specialized for Retinal Optical Coherence Tomography Image Recognition
    Wang, Haoran
    Guo, Xinyu
    Song, Kaiwen
    Sun, Mingyang
    Shao, Yanbin
    Xue, Songfeng
    Zhang, Hongwei
    Zhang, Tianyu
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72 : 1 - 17
  • [29] Spatio-Temporal Deep Residual Network with Hierarchical Attentions for Video Event Recognition
    Li, Yonggang
    Liu, Chunping
    Ji, Yi
    Gong, Shengrong
    Xu, Haibao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2020, 16 (02)
  • [30] VTAN: A Novel Video Transformer Attention-Based Network for Dynamic Sign Language Recognition
    Deng, Ziyang
    Min, Weidong
    Han, Qing
    Liu, Mengxue
    Li, Longfei
    CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (02): : 2793 - 2812