A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

被引:49
|
作者
Wang, Yun [1 ]
Metze, Florian [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
sound event detection (SED); connectionist temporal classification (CTC); transfer learning; convolutional neural networks (CNN);
D O I
10.21437/Interspeech.2017-1469
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sound event detection is the task of detecting the type, onset time, and offset time of sound events in audio streams. The mainstream solution is recurrent neural networks (RNNs). which usually predict the probability of each sound event at every time step. Connectionist temporal classification (CTC) has been applied in order to relax the need for exact annotations of onset and offset times: the CTC output layer is expected to generate a peak for each event boundary where the acoustic signal is most salient. However, with limited training data, the CTC network has been found to train slowly, and generalize poorly to new data. In this paper, we try to introduce knowledge learned from a much larger corpus into the CTC network. We train two variants of SoundNet, a deep convolutional network that takes the audio tracks of videos as input, and tries to approximate the visual information extracted by an image recognition network. A lower part of SoundNet or its variants is then used as a feature extractor for the CTC network to perform sound event detection. We show that the new feature extractor greatly accelerates the convergence of the CTC network, and slightly improves the generalization.
引用
收藏
页码:3097 / 3101
页数:5
相关论文
共 50 条
  • [21] NASAL SPEECH SOUNDS DETECTION USING CONNECTIONIST TEMPORAL CLASSIFICATION
    Cernak, Milos
    Tong, Sibo
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5574 - 5578
  • [22] Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection
    Lin, Liwei
    Wang, Xiangdong
    Liu, Hong
    Qian, Yueliang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1466 - 1478
  • [23] Sound event classification using neural networks and feature selection based methods
    Ahmed, Ammar
    Serrestou, Youssef
    Raoof, Kosai
    Diouris, Jean-Francois
    2021 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2021, : 298 - 303
  • [24] Polyphonic Sound Event Detection Using Multi Label Deep Neural Networks
    Cakir, Emre
    Heittola, Toni
    Huttunen, Heikki
    Virtanen, Tuomas
    2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
  • [25] POLYPHONIC SOUND EVENT DETECTION USING TRANSPOSED CONVOLUTIONAL RECURRENT NEURAL NETWORK
    Chatterjee, Chandra Churh
    Mulimani, Manjunath
    Koolagudi, Shashidhar G.
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 661 - 665
  • [26] SOUND EVENT CLASSIFICATION BASED ON FEATURE INTEGRATION, RECURSIVE FEATURE ELIMINATION AND STRUCTURED CLASSIFICATION
    Tran, Huy Dat
    Li, Haizhou
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 177 - 180
  • [27] Polyphonic sound event localization and detection using channel-wise FusionNet
    Spoorthy, V.
    Kooolagudi, Shashidhar G.
    APPLIED INTELLIGENCE, 2024, 54 (06) : 5015 - 5026
  • [28] Polyphonic sound event localization and detection based on Multiple Attention Fusion ResNet
    Zhang S.
    Zhang Y.
    Liao Y.
    Pang K.
    Wan Z.
    Zhou S.
    Mathematical Biosciences and Engineering, 2024, 21 (02) : 2004 - 2023
  • [29] SALSA-LITE: A FAST AND EFFECTIVE FEATURE FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION WITH MICROPHONE ARRAYS
    Thi Ngoc Tho Nguyen
    Jones, Douglas L.
    Watcharasupat, Karn N.
    Huy Phan
    Gan, Woon-Seng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 716 - 720
  • [30] CROSS-ACOUSTIC TRANSFER LEARNING FOR SOUND EVENT CLASSIFICATION
    Lim, Hyungjun
    Kim, Myung Jong
    Kim, Hoirin
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2504 - 2508