Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation

被引:0
|
作者
Gong, Xuchao [1 ]
Duan, Hongjie [1 ]
Yang, Yaozhong [1 ]
Tan, Lizhuang [2 ,3 ]
Wang, Jian [4 ]
Vasilakos, Athanasios V. [5 ]
机构
[1] Shengli Petr Management Bur, Artificial Intelligence Res Inst, Dongying 257000, Peoples R China
[2] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Key Lab Comp Power Networ, Jinan 250013, Peoples R China
[3] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Networks, Jinan 250013, Peoples R China
[4] China Univ Petr East China, Coll Sci, Qingdao 266580, Peoples R China
[5] Univ Agder UiA, Ctr AI Res CAIR, Dept ICT, N-4879 Grimstad, Norway
基金
中国国家自然科学基金;
关键词
audio classification; comparative learning; knowledge distillation; masked auto-encoder; self-supervision; transformer; REPRESENTATION;
D O I
10.3390/electronics13010052
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The current audio single-mode self-supervised classification mainly adopts a strategy based on audio spectrum reconstruction. Overall, its self-supervised approach is relatively single and cannot fully mine key semantic information in the time and frequency domains. In this regard, this article proposes a self-supervised method combined with knowledge distillation to further improve the performance of audio classification tasks. Firstly, considering the particularity of the two-dimensional audio spectrum, both self-supervised strategy construction is carried out in a single dimension in the time and frequency domains, and self-supervised construction is carried out in the joint dimension of time and frequency. Effectively learn audio spectrum details and key discriminative information through information reconstruction, comparative learning, and other methods. Secondly, in terms of feature self-supervision, two learning strategies for teacher-student models are constructed, which are internal to the model and based on knowledge distillation. Fitting the teacher's model feature expression ability, further enhances the generalization of audio classification. Comparative experiments were conducted using the AudioSet dataset, ESC50 dataset, and VGGSound dataset. The results showed that the algorithm proposed in this paper has a 0.5% to 1.3% improvement in recognition accuracy compared to the optimal method based on audio single mode.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Self-Path: Self-Supervision for Classification of Pathology Images With Limited Annotations
    Koohbanani, Navid Alemi
    Unnikrishnan, Balagopal
    Khurram, Syed Ali
    Krishnaswamy, Pavitra
    Rajpoot, Nasir
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (10) : 2845 - 2856
  • [22] Combining Semantic Self-Supervision and Self-Training for Domain Adaptation in Semantic Segmentation
    Niemeijer, Joshua
    Schaefer, Joerg P.
    2021 IEEE INTELLIGENT VEHICLES SYMPOSIUM WORKSHOPS (IV WORKSHOPS), 2021, : 364 - 371
  • [23] A Multi-scale Self-supervision Method for Improving Cell Nuclei Segmentation in Pathological Tissues
    Ali, Hesham
    Elattar, Mustafa
    Selim, Sahar
    MEDICAL IMAGE UNDERSTANDING AND ANALYSIS, MIUA 2022, 2022, 13413 : 751 - 763
  • [24] Improving an Acoustic Vehicle Detector Using an Iterative Self-Supervision Procedure
    Phathanapirom, Birdy
    Hite, Jason
    Dayman, Kenneth
    Chichester, David
    Johnson, Jared
    DATA, 2023, 8 (04)
  • [25] Fair Visual Recognition in Limited Data Regime using Self-Supervision and Self-Distillation
    Mazumder, Pratik
    Singh, Pravendra
    Namboodiri, Vinay P.
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 3889 - 3897
  • [26] INS-GNN: Improving graph imbalance learning with self-supervision
    Juan, Xin
    Zhou, Fengfeng
    Wang, Wentao
    Jin, Wei
    Tang, Jiliang
    Wang, Xin
    INFORMATION SCIENCES, 2023, 637
  • [27] Improving Transferability of Representations via Augmentation-Aware Self-Supervision
    Lee, Hankook
    Lee, Kibok
    Lee, Kimin
    Lee, Honglak
    Shin, Jinwoo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [28] Multi-representation knowledge distillation for audio classification
    Liang Gao
    Kele Xu
    Huaimin Wang
    Yuxing Peng
    Multimedia Tools and Applications, 2022, 81 : 5089 - 5112
  • [29] Multi-representation knowledge distillation for audio classification
    Gao, Liang
    Xu, Kele
    Wang, Huaimin
    Peng, Yuxing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 5089 - 5112
  • [30] TEMPORAL KNOWLEDGE DISTILLATION FOR ON-DEVICE AUDIO CLASSIFICATION
    Choi, Kwanghee
    Kersner, Martin
    Morton, Jacob
    Chang, Buru
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 486 - 490