TFECN: Time-Frequency Enhanced ConvNet for Audio Classification

被引:0
|
作者
Wang, Mengwei [1 ,2 ]
Yang, Zhe [1 ,2 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
audio classification; large kernel ConvNet; transfer learning;
D O I
10.21437/Interspeech.2023-734
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, transformer-based models have shown leading performance in audio classification, gradually replacing the dominant ConvNet in the past. However, some research has shown that certain characteristics and designs in transformers can be applied to other architectures and make them achieve similar performance as transformers. In this paper, we introduce TFECN, a pure ConvNet that combines the design in transformers and has time-frequency enhanced convolution with large kernels. It can provide a global receptive field on the frequency dimension as well as avoid the influence of the convolution's shift-equivariance on the recognition of not shiftinvariant patterns along the frequency axis. Furthermore, to use ImageNet-pretrained weights, we propose a method for transferring weights between kernels of different sizes. On the commonly used datasets AudioSet, FSD50K, and ESC50, our TFECN outperforms the models trained in the same
引用
收藏
页码:281 / 285
页数:5
相关论文
共 50 条
  • [1] AUDIO CLASSIFICATION FROM TIME-FREQUENCY TEXTURE
    Yu, Guoshen
    Slotine, Jean-Jacques
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1677 - +
  • [2] JOINT TIME-FREQUENCY SCATTERING FOR AUDIO CLASSIFICATION
    Anden, Joakim
    Lostanlen, Vincent
    Mallat, Stephane
    [J]. 2015 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, 2015,
  • [3] Classification of Time-Frequency Regions in Stereo Audio
    Harma, Aki
    [J]. JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2011, 59 (10): : 707 - 720
  • [4] LEARNING SEPARABLE TIME-FREQUENCY FILTERBANKS FOR AUDIO CLASSIFICATION
    Pu, Jie
    Panagakis, Yannis
    Pantic, Maja
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3000 - 3004
  • [5] Audio signal classification using time-frequency parameters
    Umapathy, K
    Krishnan, S
    Jimaa, S
    [J]. IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I AND II, PROCEEDINGS, 2002, : A249 - A252
  • [6] Multigroup classification of audio signals using time-frequency parameters
    Umapathy, K
    Krishnan, S
    Jimaa, S
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (02) : 308 - 315
  • [7] Time-Frequency Scattergrams for Biomedical Audio Signal Representation and Classification
    Sharma, Garima
    Umapathy, Karthikeyan
    Krishnan, Sridhar
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 564 - 576
  • [8] Enhancing Spectrogram for Audio Classification Using Time-Frequency Enhancer
    Xing, Haoran
    Zhang, Shiqi
    Takeuchi, Daiki
    Niizumi, Daisuke
    Harada, Noboru
    Makino, Shoji
    [J]. 2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1155 - 1160
  • [9] Histogram of Gradients of Time-Frequency Representations for Audio Scene Classification
    Rakotomamonjy, Alain
    Gasso, Gilles
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) : 142 - 153
  • [10] Multigroup classification of audio signals using time-frequency parameters
    Dept. of Elec. and Comp. Engineering, University of Western Ontario, London, Ont. N6A 5B9, Canada
    不详
    不详
    [J]. 1600, 308-315 (April 2005):