Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation

被引：0

作者：

Gong, Xuchao ^{[1
]}

Duan, Hongjie ^{[1
]}

Yang, Yaozhong ^{[1
]}

Tan, Lizhuang ^{[2
,3
]}

Wang, Jian ^{[4
]}

Vasilakos, Athanasios V. ^{[5
]}

机构：

[1] Shengli Petr Management Bur, Artificial Intelligence Res Inst, Dongying 257000, Peoples R China

[2] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Key Lab Comp Power Networ, Jinan 250013, Peoples R China

[3] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Networks, Jinan 250013, Peoples R China

[4] China Univ Petr East China, Coll Sci, Qingdao 266580, Peoples R China

[5] Univ Agder UiA, Ctr AI Res CAIR, Dept ICT, N-4879 Grimstad, Norway

来源：

ELECTRONICS | 2024年 / 13卷 / 01期

基金：

中国国家自然科学基金;

关键词：

audio classification; comparative learning; knowledge distillation; masked auto-encoder; self-supervision; transformer; REPRESENTATION;

D O I：

10.3390/electronics13010052

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The current audio single-mode self-supervised classification mainly adopts a strategy based on audio spectrum reconstruction. Overall, its self-supervised approach is relatively single and cannot fully mine key semantic information in the time and frequency domains. In this regard, this article proposes a self-supervised method combined with knowledge distillation to further improve the performance of audio classification tasks. Firstly, considering the particularity of the two-dimensional audio spectrum, both self-supervised strategy construction is carried out in a single dimension in the time and frequency domains, and self-supervised construction is carried out in the joint dimension of time and frequency. Effectively learn audio spectrum details and key discriminative information through information reconstruction, comparative learning, and other methods. Secondly, in terms of feature self-supervision, two learning strategies for teacher-student models are constructed, which are internal to the model and based on knowledge distillation. Fitting the teacher's model feature expression ability, further enhances the generalization of audio classification. Comparative experiments were conducted using the AudioSet dataset, ESC50 dataset, and VGGSound dataset. The results showed that the algorithm proposed in this paper has a 0.5% to 1.3% improvement in recognition accuracy compared to the optimal method based on audio single mode.

引用

页数：17

共 50 条

[41] Self-supervision Based Dual-Transformation Learning for Stain Normalization, Classification and Segmentation
Gehlot, Shiv
Gupta, Anubha
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2021, 2021, 12966 : 477 - 486
[42] A Novel Knowledge Distillation Method for Self-Supervised Hyperspectral Image Classification
Chi, Qiang
Lv, Guohua
Zhao, Guixin
Dong, Xiangjun
REMOTE SENSING, 2022, 14 (18)
[43] PVASS-MDD: Predictive Visual-audio Alignment Self-supervision for Multimodal Deepfake Detection
Yu Y.
Liu X.
Ni R.
Yang S.
Zhao Y.
Kot A.C.
IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (08) : 1 - 1
[44] Improving Semi-Supervised Learning for Remaining Useful Lifetime Estimation Through Self-Supervision
Krokotsch, Tilman
Knaak, Mirko
Guehmann, Clemens
INTERNATIONAL JOURNAL OF PROGNOSTICS AND HEALTH MANAGEMENT, 2022, 13 (01) : 1 - 19
[45] Improving Model-Based Reinforcement Learning with Internal State Representations through Self-Supervision
Scholz, Julien
Weber, Cornelius
Hafez, Muhammad Burhan
Wermter, Stefan
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[46] Self-supervision assisted multimodal remote sensing image classification with coupled self-looping convolution networks
Pande, Shivam
Banerjee, Biplab
NEURAL NETWORKS, 2023, 164 : 1 - 20
[47] Domain-guided Self-supervision of EEG Data Improves Downstream Classification Performance and Generalizability
Wagh, Neeraj
Wei, Jionghao
Rawal, Samarth
Berry, Brent
Barnard, Leland
Brinkmann, Benjamin
Worrell, Gregory
Jones, David
Varatharajah, Yogatheesan
MACHINE LEARNING FOR HEALTH, VOL 158, 2021, 158 : 130 - 142
[48] TRUSformer: improving prostate cancer detection from micro-ultrasound using attention and self-supervision
Mahdi Gilany
Paul Wilson
Andrea Perera-Ortega
Amoon Jamzad
Minh Nguyen Nhat To
Fahimeh Fooladgar
Brian Wodlinger
Purang Abolmaesumi
Parvin Mousavi
International Journal of Computer Assisted Radiology and Surgery, 2023, 18 : 1193 - 1200
[49] TRUSformer: improving prostate cancer detection from micro-ultrasound using attention and self-supervision
Gilany, Mahdi
Wilson, Paul
Perera-Ortega, Andrea
Jamzad, Amoon
To, Minh Nguyen Nhat
Fooladgar, Fahimeh
Wodlinger, Brian
Abolmaesumi, Purang
Mousavi, Parvin
INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2023, 18 (07) : 1193 - 1200
[50] SEMI-SUPERVISED DOMAIN ADAPTATION FOR ACOUSTIC SCENE CLASSIFICATION BY MINIMAX ENTROPY AND SELF-SUPERVISION APPROACHES
Takahashi, Yukiko
Takamuku, Sawa
Imoto, Keisuke
Natori, Naotake
2022 INTERNATIONAL WORKSHOP ON ACOUSTIC SIGNAL ENHANCEMENT (IWAENC 2022), 2022,

← 1 2 3 4 5 →