Polyphonic Sound Event Detection Using Modified Recurrent Temporal Pyramid Neural Network

被引:0
|
作者
Venkatesh, Spoorthy [1 ]
Koolagudi, Shashidhar G. [1 ]
机构
[1] Natl Inst Technol Karnataka, Surathkal 575025, India
关键词
Polyphonic Sound Event Detection (SED); Constant Q-Transform (CQT); Deep learning; Modified Recurrent Temporal Pyramid Network; CLASSIFICATION; SCENES;
D O I
10.1007/978-3-031-58181-6_47
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel approach to performing polyphonic Sound Event Detection (SED) is presented. A new deep learning architecture named "Modified Recurrent Temporal Pyramid Neural Network (MR-TPNN)" is introduced. The input features fed to the network are spectrograms generated from Constant Q-Transform (CQT). CQT spectrograms provided better sound event information in the audio recording than the Short Time Fourier Transform (STFT) and Fast Fourier Transform (FFT) methods. The temporal information is an essential factor for detecting the onset and offset of events in an audio recording. Capturing the temporal information is ensured by fusing Temporal pyramids and Bi-directional long short-term memory (LSTM) recurrent layers in deep learning architecture. Extensive experiments are carried out on three benchmark datasets, and the results of the proposed method are superior to those of the existing polyphonic SED systems.
引用
收藏
页码:554 / 564
页数:11
相关论文
共 50 条
  • [41] A Capsule based Approach for Polyphonic Sound Event Detection
    Liu, Yaming
    Tang, Jian
    Song, Yan
    Dai, Lirong
    2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1853 - 1857
  • [42] A survey of Deep Learning for Polyphonic Sound event detection
    Dang, An
    Vu, Toan H.
    Wang, Jia-Ching
    PROCEEDINGS OF THE 2017 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2017, : 75 - 78
  • [43] SOUND EVENT DETECTION BY CONSISTENCY TRAINING AND PSEUDO-LABELING WITH FEATURE-PYRAMID CONVOLUTIONAL RECURRENT NEURAL NETWORKS
    Koh, Chih-Yuan
    Chen, You-Siang
    Liu, Yi-Wen
    Bai, Mingsian R.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 376 - 380
  • [44] PEER COLLABORATIVE LEARNING FOR POLYPHONIC SOUND EVENT DETECTION
    Endo, Hayato
    Nishizaki, Hiromitsu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 826 - 830
  • [45] Sound Event Detection via Conformer Recurrent Neural Networks
    Gao, Fangqing
    Li, Xin
    Wei, Xiukun
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 4749 - 4754
  • [46] Polyphonic sound event localization and detection using channel-wise FusionNet
    Spoorthy, V.
    Kooolagudi, Shashidhar G.
    APPLIED INTELLIGENCE, 2024, 54 (06) : 5015 - 5026
  • [47] Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
    Adavanne, Sharath
    Politis, Archontis
    Nikunen, Joonas
    Virtanen, Tuomas
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (01) : 34 - 48
  • [48] Attentive Convolutional Recurrent Neural Network Using Phoneme-Level Acoustic Representation for Rare Sound Event Detection
    Upadhyay, Shreya G.
    Su, Bo-Hao
    Lee, Chi-Chun
    INTERSPEECH 2020, 2020, : 3102 - 3106
  • [49] Sound Event Detection in Underground Parking Garage Using Convolutional Neural Network
    Ciaburro, Giuseppe
    BIG DATA AND COGNITIVE COMPUTING, 2020, 4 (03) : 1 - 14
  • [50] Sound Event Detection in Cowshed using Synthetic Data and Convolutional Neural Network
    Pandeya, Yagya Raj
    Bhattarai, Bhuwan
    Lee, Joonwhoan
    11TH INTERNATIONAL CONFERENCE ON ICT CONVERGENCE: DATA, NETWORK, AND AI IN THE AGE OF UNTACT (ICTC 2020), 2020, : 273 - 276