A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification

被引：49

作者：

Wang, Yun ^{[1
]}

Metze, Florian ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA

来源：

18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION | 2017年

关键词：

sound event detection (SED); connectionist temporal classification (CTC); transfer learning; convolutional neural networks (CNN);

D O I：

10.21437/Interspeech.2017-1469

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sound event detection is the task of detecting the type, onset time, and offset time of sound events in audio streams. The mainstream solution is recurrent neural networks (RNNs). which usually predict the probability of each sound event at every time step. Connectionist temporal classification (CTC) has been applied in order to relax the need for exact annotations of onset and offset times: the CTC output layer is expected to generate a peak for each event boundary where the acoustic signal is most salient. However, with limited training data, the CTC network has been found to train slowly, and generalize poorly to new data. In this paper, we try to introduce knowledge learned from a much larger corpus into the CTC network. We train two variants of SoundNet, a deep convolutional network that takes the audio tracks of videos as input, and tries to approximate the visual information extracted by an image recognition network. A lower part of SoundNet or its variants is then used as a feature extractor for the CTC network to perform sound event detection. We show that the new feature extractor greatly accelerates the convergence of the CTC network, and slightly improves the generalization.

引用

页码：3097 / 3101

页数：5

共 50 条

[21] NASAL SPEECH SOUNDS DETECTION USING CONNECTIONIST TEMPORAL CLASSIFICATION
Cernak, Milos
Tong, Sibo
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5574 - 5578
[22] Specialized Decision Surface and Disentangled Feature for Weakly-Supervised Polyphonic Sound Event Detection
Lin, Liwei
Wang, Xiangdong
Liu, Hong
Qian, Yueliang
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 1466 - 1478
[23] Sound event classification using neural networks and feature selection based methods
Ahmed, Ammar
Serrestou, Youssef
Raoof, Kosai
Diouris, Jean-Francois
2021 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2021, : 298 - 303
[24] Polyphonic Sound Event Detection Using Multi Label Deep Neural Networks
Cakir, Emre
Heittola, Toni
Huttunen, Heikki
Virtanen, Tuomas
2015 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2015,
[25] POLYPHONIC SOUND EVENT DETECTION USING TRANSPOSED CONVOLUTIONAL RECURRENT NEURAL NETWORK
Chatterjee, Chandra Churh
Mulimani, Manjunath
Koolagudi, Shashidhar G.
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 661 - 665
[26] SOUND EVENT CLASSIFICATION BASED ON FEATURE INTEGRATION, RECURSIVE FEATURE ELIMINATION AND STRUCTURED CLASSIFICATION
Tran, Huy Dat
Li, Haizhou
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 177 - 180
[27] Polyphonic sound event localization and detection using channel-wise FusionNet
Spoorthy, V.
Kooolagudi, Shashidhar G.
APPLIED INTELLIGENCE, 2024, 54 (06) : 5015 - 5026
[28] Polyphonic sound event localization and detection based on Multiple Attention Fusion ResNet
Zhang S.
Zhang Y.
Liao Y.
Pang K.
Wan Z.
Zhou S.
Mathematical Biosciences and Engineering, 2024, 21 (02) : 2004 - 2023
[29] SALSA-LITE: A FAST AND EFFECTIVE FEATURE FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION WITH MICROPHONE ARRAYS
Thi Ngoc Tho Nguyen
Jones, Douglas L.
Watcharasupat, Karn N.
Huy Phan
Gan, Woon-Seng
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 716 - 720
[30] CROSS-ACOUSTIC TRANSFER LEARNING FOR SOUND EVENT CLASSIFICATION
Lim, Hyungjun
Kim, Myung Jong
Kim, Hoirin
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2504 - 2508

← 1 2 3 4 5 →