A hierarchical Transformer network for smoke video recognition

被引:0
|
作者
Cheng, Guangtao [1 ]
Xian, Baoyi [1 ]
Liu, Yifan [1 ]
Chen, Xue [2 ]
Hu, Lianjun [1 ]
Song, Zhanjie [3 ]
机构
[1] Tianjin Univ Commerce, Sch Informat Engn, Tianjin, Peoples R China
[2] Tianjin Univ, Law Sch, Tianjin, Peoples R China
[3] Tianjin Univ, Sch Math, Tianjin, Peoples R China
关键词
Smoke recognition; Deep learning; Transformer; Fire detection; FIRE DETECTION; DETECTION ALGORITHM; COLOR; MOTION; IMAGE; MODEL;
D O I
10.1016/j.dsp.2024.104959
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
During fire incidents, the quick and accurate identification of smoke is crucial for issuing early warnings and reducing the risk of fire. This paper proposes an accurate efficient smoke video recognition network based on a novel hierarchical Transformer architecture. We design the SoftPool-based multi-head self-attention (SMHSA) module, which performs self-attention operations on shortened sequences. This approach facilitates the extraction of global features across various smoke patterns while reducing computational complexity and preserving essential feature information. Our hierarchical network architecture integrates SMHSA modules progressively, enhancing the modeling of global dependencies among image patches of different scales. Specifically, shallower layers are dedicated to analyzing small-scale patches, while deeper layers focus on larger-scale patches. This structure optimizes the model's ability to capture multi-scale information, which is critical for accurate smoke recognition in video sequences. Additionally, the self-attention mechanism is implemented on sequences of progressively decreasing lengths, leading to a significant reduction in computational complexity. To support thorough evaluation and advancement in this field, we have created a dedicated smoke video recognition dataset (SVRD) that includes a wide range of scenarios and smoke patterns. Using the SVRD, we conducted extensive experiments to validate the effectiveness of our approach. Our findings clearly demonstrate that the proposed network achieves superior accuracy in smoke recognition while maintaining significantly lower computational costs compared to existing methodologies.
引用
收藏
页数:15
相关论文
共 50 条
  • [31] Neural Network Based Recognition of Smoke
    Armando M. Fernandes
    Andrei B. Utkin
    Alexander V. Lavrov
    Rui M. Vilar
    Neural Processing Letters, 2004, 20 (2) : 137 - 137
  • [32] TEXT RECOGNITION IN IMAGES BASED ON TRANSFORMER WITH HIERARCHICAL ATTENTION
    Zhu, Yiwei
    Wang, Shilin
    Huang, Zheng
    Chen, Kai
    2019 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2019, : 1945 - 1949
  • [33] HiTPR: Hierarchical Transformer for Place Recognition in Point Cloud
    Hou, Zhixing
    Yan, Yan
    Xu, Chengzhong
    Kong, Hui
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 2612 - 2618
  • [34] Hierarchical Modular Network for Video Captioning
    Ye, Hanhua
    Li, Guorong
    Qi, Yuankai
    Wang, Shuhui
    Huang, Qingming
    Yang, Ming-Hsuan
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17918 - 17927
  • [35] Learning hierarchical video representation for action recognition
    Li Q.
    Qiu Z.
    Yao T.
    Mei T.
    Rui Y.
    Luo J.
    International Journal of Multimedia Information Retrieval, 2017, 6 (1) : 85 - 98
  • [36] Video Character Recognition Through Hierarchical Classification
    Shivakumara, Palaiahnakote
    Trung Quy Phan
    Lu, Shijian
    Tan, Chew Lim
    11TH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR 2011), 2011, : 131 - 135
  • [37] Hierarchical Context Modeling for Video Event Recognition
    Wang, Xiaoyang
    Ji, Qiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (09) : 1770 - 1782
  • [38] Fire Video Recognition Based on Flame and Smoke Characteristics
    Zhao, Yaqin
    Tang, Guizhong
    2014 2ND INTERNATIONAL CONFERENCE ON SYSTEMS AND INFORMATICS (ICSAI), 2014, : 113 - 118
  • [39] A twin disentanglement Transformer Network with Hierarchical-Level Feature Reconstruction for robust multimodal emotion recognition
    Li, Chiqin
    Xie, Lun
    Wang, Xinheng
    Pan, Hang
    Wang, Zhiliang
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 264
  • [40] HMTV: hierarchical multimodal transformer for video highlight query on baseball
    Zhang, Qiaoyun
    Chang, Chih-Yung
    Su, Ming-Yang
    Chang, Hsiang-Chuan
    Roy, Diptendu Sinha
    MULTIMEDIA SYSTEMS, 2024, 30 (05)