ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

被引:10
|
作者
Guzhov, Andrey [1 ,2 ]
Raue, Federico [1 ]
Hees, Joern [1 ]
Dengel, Andreas [1 ,2 ]
机构
[1] DFKI GmbH, Kaiserslautern, Germany
[2] TU Kaiserslautern, Kaiserslautern, Germany
关键词
audio; classification; ESC; Fourier transform; fbsp-wavelet;
D O I
10.1109/IJCNN52387.2021.9533654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Environmental Sound Classification (ESC) is a rapidly evolving field that recently demonstrated the advantages of application of visual domain techniques to the audio-related tasks. Previous studies indicate that the domain-specific modification of cross-domain approaches show a promise in pushing the whole area of ESC forward. In this paper, we present a new time-frequency transformation layer that is based on complex frequency B-spline (fbsp) wavelets. Being used with a high-performance audio classification model, the proposed fbsp-layer provides an accuracy improvement over the previously used Short-Time Fourier Transform (STFT) on standard datasets. We also investigate the influence of different pre-training strategies, including the joint use of two large-scale datasets for weight initialization: ImageNet and AudioSet. Our proposed model out-performs other approaches by achieving accuracies of 95.20% on the ESC-50 and 89.14% on the UrbanSound8K datasets. Additionally, we assess the increase of model robustness against additive white Gaussian noise and reduction of an effective sample rate introduced by the proposed layer and demonstrate that the fbsp-layer improves the model's ability to withstand signal perturbations, in comparison to STFT-based training. For the sake of reproducibility, our code is made available.
引用
收藏
页数:8
相关论文
共 50 条
  • [41] A time-frequency inspired robust image watermarking
    Al-Khassaweneh, M
    Aviyente, S
    CONFERENCE RECORD OF THE THIRTY-EIGHTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2004, : 392 - 396
  • [42] Hardware realization of the robust time-frequency distributions
    Zaric, Nikola
    Stankovic, Srdjan
    Uskokovic, Zdravko
    ANNALS OF TELECOMMUNICATIONS, 2014, 69 (5-6) : 309 - 320
  • [43] Dual Stage Learning Based Dynamic Time-Frequency Mask Generation For Audio Event Classification
    Kim, Donghyeon
    Park, Jaihyun
    Han, David K.
    Ko, Hanseok
    INTERSPEECH 2020, 2020, : 836 - 840
  • [44] Fast computing of bilinear time-frequency transformation
    Fan Yongsheng
    Yu Hongying
    ISTM/2007: 7TH INTERNATIONAL SYMPOSIUM ON TEST AND MEASUREMENT, VOLS 1-7, CONFERENCE PROCEEDINGS, 2007, : 1653 - 1656
  • [45] Robust time-frequency distributions based on the robust short time Fourier transform
    Djurovic, I
    Stankovic, L
    Barkat, B
    ANNALS OF TELECOMMUNICATIONS, 2005, 60 (5-6) : 681 - 697
  • [46] ONLINE LEARNING OF TIME-FREQUENCY PATTERNS
    Ruiz-Munoz, Jose F.
    Raich, Raviv
    Orozco-Alzate, Mauricio
    Fern, Xiaoli Z.
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2811 - 2815
  • [47] Multi-Gabor dictionaries for audio time-frequency analysis
    Wolfe, PJ
    Godsill, SJ
    Dörfler, M
    PROCEEDINGS OF THE 2001 IEEE WORKSHOP ON THE APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS, 2001, : 43 - 46
  • [48] Missing Data Imputation for Time-Frequency Representations of Audio Signals
    Smaragdis, Paris
    Raj, Bhiksha
    Shashanka, Madhusudana
    JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY, 2011, 65 (03): : 361 - 370
  • [49] Multigroup classification of audio signals using time-frequency parameters
    Umapathy, K
    Krishnan, S
    Jimaa, S
    IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (02) : 308 - 315
  • [50] Time-frequency analysis for audio event detection in real scenarios
    Saggese, Alessia
    Strisciuglio, Nicola
    Vento, Mario
    Petkov, Nicolai
    2016 13TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2016, : 438 - 443