ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

被引:10
|
作者
Guzhov, Andrey [1 ,2 ]
Raue, Federico [1 ]
Hees, Joern [1 ]
Dengel, Andreas [1 ,2 ]
机构
[1] DFKI GmbH, Kaiserslautern, Germany
[2] TU Kaiserslautern, Kaiserslautern, Germany
关键词
audio; classification; ESC; Fourier transform; fbsp-wavelet;
D O I
10.1109/IJCNN52387.2021.9533654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Environmental Sound Classification (ESC) is a rapidly evolving field that recently demonstrated the advantages of application of visual domain techniques to the audio-related tasks. Previous studies indicate that the domain-specific modification of cross-domain approaches show a promise in pushing the whole area of ESC forward. In this paper, we present a new time-frequency transformation layer that is based on complex frequency B-spline (fbsp) wavelets. Being used with a high-performance audio classification model, the proposed fbsp-layer provides an accuracy improvement over the previously used Short-Time Fourier Transform (STFT) on standard datasets. We also investigate the influence of different pre-training strategies, including the joint use of two large-scale datasets for weight initialization: ImageNet and AudioSet. Our proposed model out-performs other approaches by achieving accuracies of 95.20% on the ESC-50 and 89.14% on the UrbanSound8K datasets. Additionally, we assess the increase of model robustness against additive white Gaussian noise and reduction of an effective sample rate introduced by the proposed layer and demonstrate that the fbsp-layer improves the model's ability to withstand signal perturbations, in comparison to STFT-based training. For the sake of reproducibility, our code is made available.
引用
收藏
页数:8
相关论文
共 50 条
  • [31] Janssen 2.0: Audio Inpainting in the Time-frequency Domain
    Dept. of Telecommunications, Brno University of Technology, Czech Republic
    arXiv,
  • [32] Two-channel time-frequency audio watermarking
    Hertanto, Richard Nathaniel
    Foo, Say-Wei
    2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 886 - 889
  • [33] Audio coding using dynamic time-frequency decompositions
    Purat, M
    FREQUENZ, 1996, 50 (9-10) : 205 - 210
  • [34] Audio watermarking using time-frequency compression expansion
    Wei, FS
    Mun, HS
    Mei, NL
    2004 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL 3, PROCEEDINGS, 2004, : 201 - 204
  • [35] SPARSE DENOISING OF AUDIO BY GREEDY TIME-FREQUENCY SHRINKAGE
    Bhattacharya, Gautam
    Depalle, Philippe
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [36] Environmental Sound Recognition With Time-Frequency Audio Features
    Chu, Selina
    Narayanan, Shrikanth
    Kuo, C. -C. Jay
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (06): : 1142 - 1158
  • [37] TIME-FREQUENCY NETWORKS FOR AUDIO SUPER-RESOLUTION
    Lim, Teck Yian
    Yeh, Raymond A.
    Xu, Yijia
    Do, Minh N.
    Hasegawa-Johnson, Mark
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 646 - 650
  • [38] Unsupervised learning of time-frequency patches as a noise-robust representation of speech
    Van Segbroeck, Maarten
    Van Hamme, Hugo
    SPEECH COMMUNICATION, 2009, 51 (11) : 1124 - 1138
  • [39] A Robust Time-Frequency Decomposition Model for Suppression of Mixed Gaussian-Impulse Noise in Audio Signals
    Tong, Renjie
    Zhou, Yingyue
    Zhang, Long
    Bao, Guangzhao
    Ye, Zhongfu
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 69 - 79
  • [40] A robust image watermarking based on time-frequency
    Oeztuerk, Mahmut
    Akan, Aydin
    Cekic, Yalcin
    2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 362 - +