A Robust Framework For Acoustic Scene Classification

被引:19
|
作者
Lam Pham [1 ]
McLoughlin, Ian [1 ]
Huy Phan [1 ]
Palaniappan, Ramaswamy [1 ]
机构
[1] Univ Kent, Sch Comp, Medway, Kent, England
来源
INTERSPEECH 2019 | 2019年
关键词
Machine hearing; acoustic scene classification; convolutional neural network; deep neural network; spectrogram; log-Mel; Gammatone filter; constant Q transform;
D O I
10.21437/Interspeech.2019-1841
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Acoustic scene classification (ASC) using front-end time-frequency features and back-end neural network classifiers has demonstrated good performance in recent years. However a profusion of systems has arisen to suit different tasks and datasets, utilising different feature and classifier types. This paper aims at a robust framework that can explore and utilise a range of different time-frequency features and neural networks, either singly or merged, to achieve good classification performance. In particular, we exploit three different types of front-end time-frequency feature; log energy Mel filter, Gammatone filter and constant Q transform. At the back-end we evaluate effective a two-stage model that exploits a Convolutional Neural Network for pre-trained feature extraction, followed by Deep Neural Network classifiers as a post-trained feature adaptation model and classifier. We also explore the use of a data augmentation technique for these features that effectively generates a variety of intermediate data, reinforcing model learning abilities, particularly for marginal cases. We assess performance on the DCASE2016 dataset, demonstrating good classification accuracies exceeding 90%, significantly outperforming the DCASE2016 baseline and highly competitive compared to state-of-the-art systems.
引用
收藏
页码:3634 / 3638
页数:5
相关论文
共 50 条
  • [21] LPAI-A Complete AIoT Framework Based on LPWAN Applicable to Acoustic Scene Classification Scenarios
    Jing, Xinru
    Tian, Xin
    Du, Chong
    SENSORS, 2022, 22 (23)
  • [22] Towards Speech Robustness for Acoustic Scene Classification
    Liu, Shuo
    Triantafyllopoulos, Andreas
    Ren, Zhao
    Schuller, Bjoern W.
    INTERSPEECH 2020, 2020, : 3087 - 3091
  • [23] Acoustic Scene Classification using Audio Tagging
    Jung, Jee-weon
    Shim, Hye-jin
    Kim, Ju-ho
    Kim, Seung-bin
    Yu, Ha-Jin
    INTERSPEECH 2020, 2020, : 1176 - 1180
  • [24] Deep semantic learning for acoustic scene classification
    Yun-Fei Shao
    Xin-Xin Ma
    Yong Ma
    Wei-Qiang Zhang
    EURASIP Journal on Audio, Speech, and Music Processing, 2024
  • [25] Neural Architecture Search on Acoustic Scene Classification
    Li, Jixiang
    Liang, Chuming
    Zhang, Bo
    Wang, Zhao
    Xiang, Fei
    Chu, Xiangxiang
    INTERSPEECH 2020, 2020, : 1171 - 1175
  • [26] Temporal transformer networks for acoustic scene classification
    Zhang, Teng
    Zhang, Kailai
    Wu, Ji
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1349 - 1353
  • [27] Sound recurrence analysis for acoustic scene classification
    Abesser, Jakob
    Liang, Zhiwei
    Seeber, Bernhard
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2025, 2025 (01):
  • [28] Deep Scalogram Representations for Acoustic Scene Classification
    Ren, Zhao
    Qian, Kun
    Zhang, Zixing
    Pandit, Vedhas
    Baird, Alice
    Schuller, Bjoern
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2018, 5 (03) : 662 - 669
  • [29] Sparse Representation Frameworks for Acoustic Scene Classification
    Tyagi, Akansha
    Rajan, Padmanabhan
    SPEECH AND COMPUTER, SPECOM 2023, PT I, 2023, 14338 : 177 - 188
  • [30] Light weight architecture for acoustic scene classification
    Lim, Soyoung
    Kwak, Il-Youp
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (06) : 979 - 993