Polyphonic Sound Event Detection Using Modified Recurrent Temporal Pyramid Neural Network

被引:0
|
作者
Venkatesh, Spoorthy [1 ]
Koolagudi, Shashidhar G. [1 ]
机构
[1] Natl Inst Technol Karnataka, Surathkal 575025, India
关键词
Polyphonic Sound Event Detection (SED); Constant Q-Transform (CQT); Deep learning; Modified Recurrent Temporal Pyramid Network; CLASSIFICATION; SCENES;
D O I
10.1007/978-3-031-58181-6_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel approach to performing polyphonic Sound Event Detection (SED) is presented. A new deep learning architecture named "Modified Recurrent Temporal Pyramid Neural Network (MR-TPNN)" is introduced. The input features fed to the network are spectrograms generated from Constant Q-Transform (CQT). CQT spectrograms provided better sound event information in the audio recording than the Short Time Fourier Transform (STFT) and Fast Fourier Transform (FFT) methods. The temporal information is an essential factor for detecting the onset and offset of events in an audio recording. Capturing the temporal information is ensured by fusing Temporal pyramids and Bi-directional long short-term memory (LSTM) recurrent layers in deep learning architecture. Extensive experiments are carried out on three benchmark datasets, and the results of the proposed method are superior to those of the existing polyphonic SED systems.
引用
收藏
页码:554 / 564
页数:11
相关论文
共 50 条
  • [31] Sound Event Detection Based on Bidirectional Temporal Convolutional Network and Gated Recurrent Unit
    Chen Yihan
    Guo Min
    Li Zhiqiang
    20TH INT CONF ON UBIQUITOUS COMP AND COMMUNICAT (IUCC) / 20TH INT CONF ON COMP AND INFORMATION TECHNOLOGY (CIT) / 4TH INT CONF ON DATA SCIENCE AND COMPUTATIONAL INTELLIGENCE (DSCI) / 11TH INT CONF ON SMART COMPUTING, NETWORKING, AND SERV (SMARTCNS), 2021, : 445 - 450
  • [32] Augmented Strategy For Polyphonic Sound Event Detection
    Wang, Bolun
    Fu, Zhong-Hua
    Wu, Hao
    2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1496 - 1500
  • [33] A Comprehensive Review of Polyphonic Sound Event Detection
    Chan, T. K.
    Chin, Cheng Siong
    IEEE ACCESS, 2020, 8 : 103339 - 103373
  • [34] A TRACK-WISE ENSEMBLE EVENT INDEPENDENT NETWORK FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION
    Hu, Jinbo
    Cao, Yin
    Wu, Ming
    Kong, Qiuqiang
    Yang, Feiran
    Plumbley, Mark D.
    Yang, Jun
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 9196 - 9200
  • [35] End-to-End Polyphonic Sound Event Detection Using Convolutional Recurrent Neural Networks with Learned Time-Frequency Representation Input
    Cakir, Emre
    Virtanen, Tuomas
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [36] Attention mechanism combined with residual recurrent neural network for sound event detection and localization
    Lan, Chaofeng
    Zhang, Lei
    Zhang, Yuanyuan
    Fu, Lirong
    Sun, Chao
    Han, Yulan
    Zhang, Meng
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2022, 2022 (01)
  • [37] Diffusion-Based Convolutional Recurrent Neural Network for Improving Sound Event Detection
    Al Dabel, Maryam M.
    PROCEEDINGS OF NINTH INTERNATIONAL CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGY, VOL 8, ICICT 2024, 2024, 1004 : 173 - 183
  • [38] Attention mechanism combined with residual recurrent neural network for sound event detection and localization
    Chaofeng Lan
    Lei Zhang
    Yuanyuan Zhang
    Lirong Fu
    Chao Sun
    Yulan Han
    Meng Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2022
  • [39] BLSTM-HMM HYBRID SYSTEM COMBINED WITH SOUND ACTIVITY DETECTION NETWORK FOR POLYPHONIC SOUND EVENT DETECTION
    Hayashi, Tomoki
    Watanabe, Shinji
    Toda, Tomoki
    Hori, Takaaki
    Le Roux, Jonathan
    Takeda, Kazuya
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 766 - 770
  • [40] Convolutional Neural Networks with Multi-task Loss for Polyphonic Sound Event Detection
    Liu, Huang
    Wang, Xiu
    Guan, Fa-Qian
    Hu, Jin-Sen
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND APPLICATION ENGINEERING (CSAE2018), 2018,