TFECN: Time-Frequency Enhanced ConvNet for Audio Classification

被引:0
|
作者
Wang, Mengwei [1 ,2 ]
Yang, Zhe [1 ,2 ]
机构
[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China
[2] Soochow Univ, Prov Key Lab Comp Informat Proc Technol, Suzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
audio classification; large kernel ConvNet; transfer learning;
D O I
10.21437/Interspeech.2023-734
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, transformer-based models have shown leading performance in audio classification, gradually replacing the dominant ConvNet in the past. However, some research has shown that certain characteristics and designs in transformers can be applied to other architectures and make them achieve similar performance as transformers. In this paper, we introduce TFECN, a pure ConvNet that combines the design in transformers and has time-frequency enhanced convolution with large kernels. It can provide a global receptive field on the frequency dimension as well as avoid the influence of the convolution's shift-equivariance on the recognition of not shiftinvariant patterns along the frequency axis. Furthermore, to use ImageNet-pretrained weights, we propose a method for transferring weights between kernels of different sizes. On the commonly used datasets AudioSet, FSD50K, and ESC50, our TFECN outperforms the models trained in the same
引用
收藏
页码:281 / 285
页数:5
相关论文
共 50 条
  • [31] New time-frequency symbol classification
    Iem, BG
    Papandreou-Suppappola, A
    Boudreaux-Bartels, GF
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 1345 - 1348
  • [32] Singer Identification Using Time-Frequency Audio Feature
    Doungpaisan, Pafan
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2011, PT II, 2011, 6676 : 486 - 495
  • [33] Exploiting Time-Frequency Conformers for Music Audio Enhancement
    Chae, Yunkee
    Koo, Junghyun
    Lee, Sungho
    Lee, Kyogu
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2362 - 2370
  • [34] Optimizing time-frequency kernels for classification
    Gillespie, BW
    Atlas, LE
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2001, 49 (03) : 485 - 496
  • [35] Time-frequency filters for target classification
    Chevret, P.
    Gache, N.
    Zimpfer, V.
    [J]. Journal of the Acoustical Society of America, 1999, 106 (4 pt 1):
  • [36] New time-frequency symbol classification
    Univ of Rhode Island, Kingston, United States
    [J]. ICASSP IEEE Int Conf Acoust Speech Signal Process Proc, (1345-1348):
  • [37] Audio Fingerprint Extraction Based on Time-Frequency Domain
    Liu, Zhengzheng
    Li, Cong
    Cao, Sanxing
    [J]. 2016 2ND IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2016, : 1975 - 1979
  • [38] Two-channel time-frequency audio watermarking
    Hertanto, Richard Nathaniel
    Foo, Say-Wei
    [J]. 2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 886 - 889
  • [39] Janssen 2.0: Audio Inpainting in the Time-frequency Domain
    Dept. of Telecommunications, Brno University of Technology, Czech Republic
    [J]. arXiv,
  • [40] Time-Frequency Signal Reconstruction of Nonsparse Audio Signals
    Stankovic, Isidora
    Dakovic, Milos
    Ioana, Cornel
    [J]. 2017 22ND INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2017,