Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation

被引:0
|
作者
Gong, Xuchao [1 ]
Duan, Hongjie [1 ]
Yang, Yaozhong [1 ]
Tan, Lizhuang [2 ,3 ]
Wang, Jian [4 ]
Vasilakos, Athanasios V. [5 ]
机构
[1] Shengli Petr Management Bur, Artificial Intelligence Res Inst, Dongying 257000, Peoples R China
[2] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Key Lab Comp Power Networ, Jinan 250013, Peoples R China
[3] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Networks, Jinan 250013, Peoples R China
[4] China Univ Petr East China, Coll Sci, Qingdao 266580, Peoples R China
[5] Univ Agder UiA, Ctr AI Res CAIR, Dept ICT, N-4879 Grimstad, Norway
基金
中国国家自然科学基金;
关键词
audio classification; comparative learning; knowledge distillation; masked auto-encoder; self-supervision; transformer; REPRESENTATION;
D O I
10.3390/electronics13010052
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The current audio single-mode self-supervised classification mainly adopts a strategy based on audio spectrum reconstruction. Overall, its self-supervised approach is relatively single and cannot fully mine key semantic information in the time and frequency domains. In this regard, this article proposes a self-supervised method combined with knowledge distillation to further improve the performance of audio classification tasks. Firstly, considering the particularity of the two-dimensional audio spectrum, both self-supervised strategy construction is carried out in a single dimension in the time and frequency domains, and self-supervised construction is carried out in the joint dimension of time and frequency. Effectively learn audio spectrum details and key discriminative information through information reconstruction, comparative learning, and other methods. Secondly, in terms of feature self-supervision, two learning strategies for teacher-student models are constructed, which are internal to the model and based on knowledge distillation. Fitting the teacher's model feature expression ability, further enhances the generalization of audio classification. Comparative experiments were conducted using the AudioSet dataset, ESC50 dataset, and VGGSound dataset. The results showed that the algorithm proposed in this paper has a 0.5% to 1.3% improvement in recognition accuracy compared to the optimal method based on audio single mode.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Knowledge Distillation Using Hierarchical Self-Supervision Augmented Distribution
    Yang, Chuanguang
    An, Zhulin
    Cai, Linhang
    Xu, Yongjun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (02) : 2094 - 2108
  • [2] Improving image classification robustness using self-supervision
    Wittscher, Ladyna
    Diers, Jan
    Pigorsch, Christian
    STAT, 2022, 11 (01):
  • [3] An Improved Audio Classification Method Based on Parameter-Free Attention Combined with Self-Supervision
    Gong X.
    Li Z.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2023, 35 (03): : 434 - 440
  • [4] RS-SSKD: Self-Supervision Equipped with Knowledge Distillation for Few-Shot Remote Sensing Scene Classification
    Zhang, Pei
    Li, Ying
    Wang, Dong
    Wang, Jiyue
    SENSORS, 2021, 21 (05) : 1 - 23
  • [5] Self-distillation and self-supervision for partial label learning
    Yu, Xiaotong
    Sun, Shiding
    Tian, Yingjie
    PATTERN RECOGNITION, 2024, 146
  • [6] Pre-Training Audio Representations With Self-Supervision
    Tagliasacchi, Marco
    Gfeller, Beat
    Quitry, Felix de Chaumont
    Roblek, Dominik
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 600 - 604
  • [7] InsCLR: Improving Instance Retrieval with Self-Supervision
    Deng, Zelu
    Zhong, Yujie
    Guo, Sheng
    Huang, Weilin
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 516 - 524
  • [8] Self-Supervision and Self-Distillation with Multilayer Feature Contrast for Supervision Collapse in Few-Shot Remote Sensing Scene Classification
    Zhou, Haonan
    Du, Xiaoping
    Li, Sen
    REMOTE SENSING, 2022, 14 (13)
  • [9] Improving Spatiotemporal Self-supervision by Deep Reinforcement Learning
    Buechler, Uta
    Brattoli, Biagio
    Ommer, Bjoern
    COMPUTER VISION - ECCV 2018, PT 15, 2018, 11219 : 797 - 814
  • [10] GROUP THERAPY - EFFECTIVE METHOD OF SELF-SUPERVISION
    COHEN, AI
    SMALL GROUP BEHAVIOR, 1973, 4 (01): : 69 - 80