Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation

被引:0
|
作者
Gong, Xuchao [1 ]
Duan, Hongjie [1 ]
Yang, Yaozhong [1 ]
Tan, Lizhuang [2 ,3 ]
Wang, Jian [4 ]
Vasilakos, Athanasios V. [5 ]
机构
[1] Shengli Petr Management Bur, Artificial Intelligence Res Inst, Dongying 257000, Peoples R China
[2] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Key Lab Comp Power Networ, Jinan 250013, Peoples R China
[3] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Networks, Jinan 250013, Peoples R China
[4] China Univ Petr East China, Coll Sci, Qingdao 266580, Peoples R China
[5] Univ Agder UiA, Ctr AI Res CAIR, Dept ICT, N-4879 Grimstad, Norway
基金
中国国家自然科学基金;
关键词
audio classification; comparative learning; knowledge distillation; masked auto-encoder; self-supervision; transformer; REPRESENTATION;
D O I
10.3390/electronics13010052
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The current audio single-mode self-supervised classification mainly adopts a strategy based on audio spectrum reconstruction. Overall, its self-supervised approach is relatively single and cannot fully mine key semantic information in the time and frequency domains. In this regard, this article proposes a self-supervised method combined with knowledge distillation to further improve the performance of audio classification tasks. Firstly, considering the particularity of the two-dimensional audio spectrum, both self-supervised strategy construction is carried out in a single dimension in the time and frequency domains, and self-supervised construction is carried out in the joint dimension of time and frequency. Effectively learn audio spectrum details and key discriminative information through information reconstruction, comparative learning, and other methods. Secondly, in terms of feature self-supervision, two learning strategies for teacher-student models are constructed, which are internal to the model and based on knowledge distillation. Fitting the teacher's model feature expression ability, further enhances the generalization of audio classification. Comparative experiments were conducted using the AudioSet dataset, ESC50 dataset, and VGGSound dataset. The results showed that the algorithm proposed in this paper has a 0.5% to 1.3% improvement in recognition accuracy compared to the optimal method based on audio single mode.
引用
收藏
页数:17
相关论文
共 50 条
  • [41] Self-supervision Based Dual-Transformation Learning for Stain Normalization, Classification and Segmentation
    Gehlot, Shiv
    Gupta, Anubha
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2021, 2021, 12966 : 477 - 486
  • [42] A Novel Knowledge Distillation Method for Self-Supervised Hyperspectral Image Classification
    Chi, Qiang
    Lv, Guohua
    Zhao, Guixin
    Dong, Xiangjun
    REMOTE SENSING, 2022, 14 (18)
  • [43] PVASS-MDD: Predictive Visual-audio Alignment Self-supervision for Multimodal Deepfake Detection
    Yu Y.
    Liu X.
    Ni R.
    Yang S.
    Zhao Y.
    Kot A.C.
    IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (08) : 1 - 1
  • [44] Improving Semi-Supervised Learning for Remaining Useful Lifetime Estimation Through Self-Supervision
    Krokotsch, Tilman
    Knaak, Mirko
    Guehmann, Clemens
    INTERNATIONAL JOURNAL OF PROGNOSTICS AND HEALTH MANAGEMENT, 2022, 13 (01) : 1 - 19
  • [45] Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision
    Scholz, Julien
    Weber, Cornelius
    Hafez, Muhammad Burhan
    Wermter, Stefan
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [46] Self-supervision assisted multimodal remote sensing image classification with coupled self-looping convolution networks
    Pande, Shivam
    Banerjee, Biplab
    NEURAL NETWORKS, 2023, 164 : 1 - 20
  • [47] Domain-guided Self-supervision of EEG Data Improves Downstream Classification Performance and Generalizability
    Wagh, Neeraj
    Wei, Jionghao
    Rawal, Samarth
    Berry, Brent
    Barnard, Leland
    Brinkmann, Benjamin
    Worrell, Gregory
    Jones, David
    Varatharajah, Yogatheesan
    MACHINE LEARNING FOR HEALTH, VOL 158, 2021, 158 : 130 - 142
  • [48] TRUSformer: improving prostate cancer detection from micro-ultrasound using attention and self-supervision
    Mahdi Gilany
    Paul Wilson
    Andrea Perera-Ortega
    Amoon Jamzad
    Minh Nguyen Nhat To
    Fahimeh Fooladgar
    Brian Wodlinger
    Purang Abolmaesumi
    Parvin Mousavi
    International Journal of Computer Assisted Radiology and Surgery, 2023, 18 : 1193 - 1200
  • [49] TRUSformer: improving prostate cancer detection from micro-ultrasound using attention and self-supervision
    Gilany, Mahdi
    Wilson, Paul
    Perera-Ortega, Andrea
    Jamzad, Amoon
    To, Minh Nguyen Nhat
    Fooladgar, Fahimeh
    Wodlinger, Brian
    Abolmaesumi, Purang
    Mousavi, Parvin
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2023, 18 (07) : 1193 - 1200
  • [50] SEMI-SUPERVISED DOMAIN ADAPTATION FOR ACOUSTIC SCENE CLASSIFICATION BY MINIMAX ENTROPY AND SELF-SUPERVISION APPROACHES
    Takahashi, Yukiko
    Takamuku, Sawa
    Imoto, Keisuke
    Natori, Naotake
    2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,