A Robust Framework For Acoustic Scene Classification

被引：19

作者：

Lam Pham ^{[1
]}

McLoughlin, Ian ^{[1
]}

Huy Phan ^{[1
]}

Palaniappan, Ramaswamy ^{[1
]}

机构：

[1] Univ Kent, Sch Comp, Medway, Kent, England

来源：

INTERSPEECH 2019 | 2019年

关键词：

Machine hearing; acoustic scene classification; convolutional neural network; deep neural network; spectrogram; log-Mel; Gammatone filter; constant Q transform;

D O I：

10.21437/Interspeech.2019-1841

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Acoustic scene classification (ASC) using front-end time-frequency features and back-end neural network classifiers has demonstrated good performance in recent years. However a profusion of systems has arisen to suit different tasks and datasets, utilising different feature and classifier types. This paper aims at a robust framework that can explore and utilise a range of different time-frequency features and neural networks, either singly or merged, to achieve good classification performance. In particular, we exploit three different types of front-end time-frequency feature; log energy Mel filter, Gammatone filter and constant Q transform. At the back-end we evaluate effective a two-stage model that exploits a Convolutional Neural Network for pre-trained feature extraction, followed by Deep Neural Network classifiers as a post-trained feature adaptation model and classifier. We also explore the use of a data augmentation technique for these features that effectively generates a variety of intermediate data, reinforcing model learning abilities, particularly for marginal cases. We assess performance on the DCASE2016 dataset, demonstrating good classification accuracies exceeding 90%, significantly outperforming the DCASE2016 baseline and highly competitive compared to state-of-the-art systems.

引用

页码：3634 / 3638

页数：5

共 50 条

[21] LPAI-A Complete AIoT Framework Based on LPWAN Applicable to Acoustic Scene Classification Scenarios
Jing, Xinru
Tian, Xin
Du, Chong
SENSORS, 2022, 22 (23)
[22] Towards Speech Robustness for Acoustic Scene Classification
Liu, Shuo
Triantafyllopoulos, Andreas
Ren, Zhao
Schuller, Bjoern W.
INTERSPEECH 2020, 2020, : 3087 - 3091
[23] Acoustic Scene Classification using Audio Tagging
Jung, Jee-weon
Shim, Hye-jin
Kim, Ju-ho
Kim, Seung-bin
Yu, Ha-Jin
INTERSPEECH 2020, 2020, : 1176 - 1180
[24] Deep semantic learning for acoustic scene classification
Yun-Fei Shao
Xin-Xin Ma
Yong Ma
Wei-Qiang Zhang
EURASIP Journal on Audio, Speech, and Music Processing, 2024
[25] Neural Architecture Search on Acoustic Scene Classification
Li, Jixiang
Liang, Chuming
Zhang, Bo
Wang, Zhao
Xiang, Fei
Chu, Xiangxiang
INTERSPEECH 2020, 2020, : 1171 - 1175
[26] Temporal transformer networks for acoustic scene classification
Zhang, Teng
Zhang, Kailai
Wu, Ji
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1349 - 1353
[27] Sound recurrence analysis for acoustic scene classification
Abesser, Jakob
Liang, Zhiwei
Seeber, Bernhard
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2025, 2025 (01):
[28] Deep Scalogram Representations for Acoustic Scene Classification
Ren, Zhao
Qian, Kun
Zhang, Zixing
Pandit, Vedhas
Baird, Alice
Schuller, Bjoern
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2018, 5 (03) : 662 - 669
[29] Sparse Representation Frameworks for Acoustic Scene Classification
Tyagi, Akansha
Rajan, Padmanabhan
SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 177 - 188
[30] Light weight architecture for acoustic scene classification
Lim, Soyoung
Kwak, Il-Youp
KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (06) : 979 - 993

← 1 2 3 4 5 →