Polyphonic Sound Event Detection Using Modified Recurrent Temporal Pyramid Neural Network

被引:0
|
作者
Venkatesh, Spoorthy [1 ]
Koolagudi, Shashidhar G. [1 ]
机构
[1] Natl Inst Technol Karnataka, Surathkal 575025, India
关键词
Polyphonic Sound Event Detection (SED); Constant Q-Transform (CQT); Deep learning; Modified Recurrent Temporal Pyramid Network; CLASSIFICATION; SCENES;
D O I
10.1007/978-3-031-58181-6_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel approach to performing polyphonic Sound Event Detection (SED) is presented. A new deep learning architecture named "Modified Recurrent Temporal Pyramid Neural Network (MR-TPNN)" is introduced. The input features fed to the network are spectrograms generated from Constant Q-Transform (CQT). CQT spectrograms provided better sound event information in the audio recording than the Short Time Fourier Transform (STFT) and Fast Fourier Transform (FFT) methods. The temporal information is an essential factor for detecting the onset and offset of events in an audio recording. Capturing the temporal information is ensured by fusing Temporal pyramids and Bi-directional long short-term memory (LSTM) recurrent layers in deep learning architecture. Extensive experiments are carried out on three benchmark datasets, and the results of the proposed method are superior to those of the existing polyphonic SED systems.
引用
收藏
页码:554 / 564
页数:11
相关论文
共 50 条
  • [1] POLYPHONIC SOUND EVENT DETECTION USING TRANSPOSED CONVOLUTIONAL RECURRENT NEURAL NETWORK
    Chatterjee, Chandra Churh
    Mulimani, Manjunath
    Koolagudi, Shashidhar G.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 661 - 665
  • [2] Relational recurrent neural networks for polyphonic sound event detection
    Ma, Junbo
    Wang, Ruili
    Ji, Wanting
    Zheng, Hao
    Zhu, En
    Yin, Jianping
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (20) : 29509 - 29527
  • [3] Relational recurrent neural networks for polyphonic sound event detection
    Junbo Ma
    Ruili Wang
    Wanting Ji
    Hao Zheng
    En Zhu
    Jianping Yin
    Multimedia Tools and Applications, 2019, 78 : 29509 - 29527
  • [4] Convolutional Recurrent Neural Networks for Polyphonic Sound Event Detection
    Cakir, Emre
    Parascandolo, Giambattista
    Heittola, Toni
    Huttunen, Heikki
    Virtanen, Tuomas
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1291 - 1303
  • [5] Polyphonic Sound Event Detection by Using Capsule Neural Networks
    Vesperini, Fabio
    Gabrielli, Leonardo
    Principi, Emanuele
    Squartini, Stefano
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (02) : 310 - 322
  • [6] RECURRENT NEURAL NETWORKS FOR POLYPHONIC SOUND EVENT DETECTION IN REAL LIFE RECORDINGS
    Parascandolo, Giambattista
    Huttunen, Heikki
    Virtanen, Tuomas
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 6440 - 6444
  • [7] Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection
    Cakir, Emre
    Ozan, Ezgi Can
    Virtanen, Tuomas
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3399 - 3406
  • [8] Temporal Pyramid Recurrent Neural Network
    Ma, Qianli
    Lin, Zhenxi
    Chen, Enhuan
    Cottrell, Garrison W.
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 5061 - 5068
  • [9] SOUND EVENT DETECTION USING SPATIAL FEATURES AND CONVOLUTIONAL RECURRENT NEURAL NETWORK
    Adavanne, Sharath
    Pertila, Pasi
    Virtanen, Tuomas
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 771 - 775
  • [10] A FIRST ATTEMPT AT POLYPHONIC SOUND EVENT DETECTION USING CONNECTIONIST TEMPORAL CLASSIFICATION
    Wang, Yun
    Metze, Florian
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2986 - 2990