ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

被引:8
|
作者
Guzhov, Andrey [1 ,2 ]
Raue, Federico [1 ]
Hees, Joern [1 ]
Dengel, Andreas [1 ,2 ]
机构
[1] DFKI GmbH, Kaiserslautern, Germany
[2] TU Kaiserslautern, Kaiserslautern, Germany
关键词
audio; classification; ESC; Fourier transform; fbsp-wavelet;
D O I
10.1109/IJCNN52387.2021.9533654
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Environmental Sound Classification (ESC) is a rapidly evolving field that recently demonstrated the advantages of application of visual domain techniques to the audio-related tasks. Previous studies indicate that the domain-specific modification of cross-domain approaches show a promise in pushing the whole area of ESC forward. In this paper, we present a new time-frequency transformation layer that is based on complex frequency B-spline (fbsp) wavelets. Being used with a high-performance audio classification model, the proposed fbsp-layer provides an accuracy improvement over the previously used Short-Time Fourier Transform (STFT) on standard datasets. We also investigate the influence of different pre-training strategies, including the joint use of two large-scale datasets for weight initialization: ImageNet and AudioSet. Our proposed model out-performs other approaches by achieving accuracies of 95.20% on the ESC-50 and 89.14% on the UrbanSound8K datasets. Additionally, we assess the increase of model robustness against additive white Gaussian noise and reduction of an effective sample rate introduced by the proposed layer and demonstrate that the fbsp-layer improves the model's ability to withstand signal perturbations, in comparison to STFT-based training. For the sake of reproducibility, our code is made available.
引用
收藏
页数:8
相关论文
共 50 条
  • [21] Time-frequency distributions of time-frequency periodic operators and the discrete Gabor transformation
    Sirianunpiboon, S
    Howard, SD
    ISSPA 96 - FOURTH INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING AND ITS APPLICATIONS, PROCEEDINGS, VOLS 1 AND 2, 1996, : 718 - 721
  • [22] A Robust Watermarking Algorithm Based on Time-Frequency Analysis in S Transformation Domain
    Deng Minghui
    Zhen Jingbo
    SECOND INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING, 2010, 7546
  • [23] Block Smoothed Sigmoid-Based Shrinkage in Time-Frequency Domain for Robust Audio Denoising
    Van Khanh Mai
    Pastor, Dominique
    Aissa-El-Bey, Abdeldjalil
    9TH INTERNATIONAL SYMPOSIUM ON SIGNAL, IMAGE, VIDEO AND COMMUNICATIONS (ISIVC 2018), 2018, : 1 - 4
  • [24] Time-frequency learning machines
    Honeine, Paul
    Richard, Cedric
    Flandrin, Patrick
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2007, 55 (07) : 3930 - 3936
  • [25] Singer Identification Using Time-Frequency Audio Feature
    Doungpaisan, Pafan
    ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II, 2011, 6676 : 486 - 495
  • [26] Exploiting Time-Frequency Conformers for Music Audio Enhancement
    Chae, Yunkee
    Koo, Junghyun
    Lee, Sungho
    Lee, Kyogu
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2362 - 2370
  • [27] Audio signal classification using time-frequency parameters
    Umapathy, K
    Krishnan, S
    Jimaa, S
    IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A249 - A252
  • [28] TFECN: Time-Frequency Enhanced ConvNet for Audio Classification
    Wang, Mengwei
    Yang, Zhe
    INTERSPEECH 2023, 2023, : 281 - 285
  • [29] Audio Fingerprint Extraction Based on Time-Frequency Domain
    Liu, Zhengzheng
    Li, Cong
    Cao, Sanxing
    2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1975 - 1979
  • [30] Audio coding using dynamic time-frequency decompositions
    Purat, M
    FREQUENZ, 1996, 50 (9-10) : 205 - 210