ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

被引:8
|
作者
Guzhov, Andrey [1 ,2 ]
Raue, Federico [1 ]
Hees, Joern [1 ]
Dengel, Andreas [1 ,2 ]
机构
[1] DFKI GmbH, Kaiserslautern, Germany
[2] TU Kaiserslautern, Kaiserslautern, Germany
关键词
audio; classification; ESC; Fourier transform; fbsp-wavelet;
D O I
10.1109/IJCNN52387.2021.9533654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Environmental Sound Classification (ESC) is a rapidly evolving field that recently demonstrated the advantages of application of visual domain techniques to the audio-related tasks. Previous studies indicate that the domain-specific modification of cross-domain approaches show a promise in pushing the whole area of ESC forward. In this paper, we present a new time-frequency transformation layer that is based on complex frequency B-spline (fbsp) wavelets. Being used with a high-performance audio classification model, the proposed fbsp-layer provides an accuracy improvement over the previously used Short-Time Fourier Transform (STFT) on standard datasets. We also investigate the influence of different pre-training strategies, including the joint use of two large-scale datasets for weight initialization: ImageNet and AudioSet. Our proposed model out-performs other approaches by achieving accuracies of 95.20% on the ESC-50 and 89.14% on the UrbanSound8K datasets. Additionally, we assess the increase of model robustness against additive white Gaussian noise and reduction of an effective sample rate introduced by the proposed layer and demonstrate that the fbsp-layer improves the model's ability to withstand signal perturbations, in comparison to STFT-based training. For the sake of reproducibility, our code is made available.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] LEARNING SEPARABLE TIME-FREQUENCY FILTERBANKS FOR AUDIO CLASSIFICATION
    Pu, Jie
    Panagakis, Yannis
    Pantic, Maja
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3000 - 3004
  • [2] Time-Frequency Feature Fusion for Noise Robust Audio Event Classification
    McLoughlin, Ian
    Xie, Zhipeng
    Song, Yan
    Phan, Huy
    Palaniappan, Ramaswamy
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (03) : 1672 - 1687
  • [3] Subband Time-Frequency Image Texture Features for Robust Audio Surveillance
    Sharan, Roneel V.
    Moir, Tom J.
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2015, 10 (12) : 2605 - 2615
  • [4] Robust Time-Frequency Watermarking Based on Improved S Transformation
    Deng Minghui
    Zeng Qingshuang
    Li Yanjun
    MATERIALS, MECHATRONICS AND AUTOMATION, PTS 1-3, 2011, 467-469 : 146 - +
  • [5] Time-Frequency Processing for Spatial Audio
    Rumsey, Francis
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2010, 58 (7-8): : 655 - 659
  • [6] Robust Image Information Identification Algorithm Based on Time-frequency Transformation
    Deng Minghui
    SUSTAINABLE DEVELOPMENT OF NATURAL RESOURCES, PTS 1-3, 2013, 616-618 : 2214 - 2218
  • [7] Robust time-frequency distributions
    Katkovnik, W
    Djurovic, I
    Stankovic, LJ
    ISSPA 2001: SIXTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, VOLS 1 AND 2, PROCEEDINGS, 2001, : 156 - 157
  • [8] ROBUST UNDERDETERMINED BLIND AUDIO SOURCE SEPARATION OF SPARSE SIGNALS IN THE TIME-FREQUENCY DOMAIN
    Sbai, Si Mohamed Aziz
    Aissa-El-Bey, Abdeldjalil
    Pastor, Dominique
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 3716 - 3719
  • [9] Robust Audio Information Hiding Based on Stereo Phase Difference in Time-frequency Domain
    Ono, Nobutaka
    2014 TENTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING (IIH-MSP 2014), 2014, : 260 - 263
  • [10] AUDIO CLASSIFICATION FROM TIME-FREQUENCY TEXTURE
    Yu, Guoshen
    Slotine, Jean-Jacques
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1677 - +