CNN-TRANSFORMER WITH SELF-ATTENTION NETWORK FOR SOUND EVENT DETECTION

被引:5
|
作者
Wakayama, Keigo [1 ]
Saito, Shoichiro [1 ]
机构
[1] NTT Corp, Tokyo, Japan
关键词
Sound event detection; Weakly-supervised SED; DNN architecture; Self-attention Network; Vector attention;
D O I
10.1109/ICASSP43922.2022.9747762
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In sound event detection (SED), the representation ability of deep neural network (DNN) models must be increased to significantly improve the accuracy or increase the number of classifiable classes. When building large-scale DNN models, a highly parameter-efficient DNN architecture should preferably be adopted. In image recognition, there has been a proposal to replace a convolutional neural network (CNN) extracting high-level features with a highly parameter-efficient DNN architecture, i.e., a self-attention network (SAN). The high-level features are essential information that contributes to prediction. In SED, we find that a model that exceeds the prediction accuracy of CNN-Transformer is difficult to build simply by replacing CNN with SAN, in the process of our experiments. To construct a model with high prediction accuracy while capturing the properties of acoustic signals well, we propose an architecture called a CNN-SAN-Transformer, which retains CNN in the blocks close to the input and uses SAN in all remaining blocks. Experimental results suggest that the proposed method has the same or higher prediction accuracy with a smaller number of parameters than the CNN-Transformer and higher prediction accuracy with a similar number of parameters to the CNN-Transformer and that the proposed method may be a parameter-efficient architecture.
引用
收藏
页码:806 / 810
页数:5
相关论文
共 50 条
  • [1] Event detection by combining self-attention and CNN-BiGRU
    Wang K.
    Wang M.
    Liu X.
    Tian G.
    Li C.
    Liu W.
    [J]. Xi'an Dianzi Keji Daxue Xuebao/Journal of Xidian University, 2022, 49 (05): : 181 - 188
  • [2] Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization
    Kong, Qiuqiang
    Xu, Yong
    Wang, Wenwu
    Plumbley, Mark D.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 (28) : 2450 - 2460
  • [3] WEAKLY-SUPERVISED SOUND EVENT DETECTION WITH SELF-ATTENTION
    Miyazaki, Koichi
    Komatsu, Tatsuya
    Hayashi, Tomoki
    Watanabe, Shinji
    Toda, Tomoki
    Takeda, Kazuya
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 66 - 70
  • [4] Efficient Lightweight Speaker Verification with Broadcasting CNN-Transformer and Knowledge Distillation Training of Self-Attention Maps
    Choi, Jeong-Hwan
    Yang, Joon-Young
    Chang, Joon-Hyuk
    [J]. IEEE/ACM Transactions on Audio Speech and Language Processing, 2024, 32 : 4580 - 4595
  • [5] A hierarchical CNN-Transformer model for network intrusion detection
    Luo, Sijie
    Zhao, Zhiheng
    Hu, Qiyuan
    Liu, Yang
    [J]. 2ND INTERNATIONAL CONFERENCE ON APPLIED MATHEMATICS, MODELLING, AND INTELLIGENT COMPUTING (CAMMIC 2022), 2022, 12259
  • [6] SPARSE SELF-ATTENTION FOR SEMI-SUPERVISED SOUND EVENT DETECTION
    Guan, Yadong
    Xue, Jiabin
    Zheng, Guibin
    Han, Jiqing
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 821 - 825
  • [7] TACT: Text attention based CNN-Transformer network for polyp segmentation
    Zhao, Yiyang
    Li, Jinjiang
    Hua, Zhen
    [J]. INTERNATIONAL JOURNAL OF IMAGING SYSTEMS AND TECHNOLOGY, 2024, 34 (02)
  • [8] Adaptive Memory-Controlled Self-Attention for Polyphonic Sound Event Detection
    Wang, Mei
    Yao, Yu
    Qiu, Hongbin
    Song, Xiyu
    [J]. SYMMETRY-BASEL, 2022, 14 (02):
  • [9] A self-attention network for smoke detection
    Jiang, Minghua
    Zhao, Yaxin
    Yu, Feng
    Zhou, Changlong
    Peng, Tao
    [J]. FIRE SAFETY JOURNAL, 2022, 129
  • [10] Hybrid CNN-Transformer Network for Electricity Theft Detection in Smart Grids
    Bai, Yu
    Sun, Haitong
    Zhang, Lili
    Wu, Haoqi
    [J]. SENSORS, 2023, 23 (20)