Improving Audio Classification Method by Combining Self-Supervision with Knowledge Distillation

被引:0
|
作者
Gong, Xuchao [1 ]
Duan, Hongjie [1 ]
Yang, Yaozhong [1 ]
Tan, Lizhuang [2 ,3 ]
Wang, Jian [4 ]
Vasilakos, Athanasios V. [5 ]
机构
[1] Shengli Petr Management Bur, Artificial Intelligence Res Inst, Dongying 257000, Peoples R China
[2] Qilu Univ Technol, Shandong Acad Sci, Shandong Comp Sci Ctr, Natl Supercomp Ctr Jinan,Key Lab Comp Power Networ, Jinan 250013, Peoples R China
[3] Shandong Fundamental Res Ctr Comp Sci, Shandong Prov Key Lab Comp Networks, Jinan 250013, Peoples R China
[4] China Univ Petr East China, Coll Sci, Qingdao 266580, Peoples R China
[5] Univ Agder UiA, Ctr AI Res CAIR, Dept ICT, N-4879 Grimstad, Norway
基金
中国国家自然科学基金;
关键词
audio classification; comparative learning; knowledge distillation; masked auto-encoder; self-supervision; transformer; REPRESENTATION;
D O I
10.3390/electronics13010052
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The current audio single-mode self-supervised classification mainly adopts a strategy based on audio spectrum reconstruction. Overall, its self-supervised approach is relatively single and cannot fully mine key semantic information in the time and frequency domains. In this regard, this article proposes a self-supervised method combined with knowledge distillation to further improve the performance of audio classification tasks. Firstly, considering the particularity of the two-dimensional audio spectrum, both self-supervised strategy construction is carried out in a single dimension in the time and frequency domains, and self-supervised construction is carried out in the joint dimension of time and frequency. Effectively learn audio spectrum details and key discriminative information through information reconstruction, comparative learning, and other methods. Secondly, in terms of feature self-supervision, two learning strategies for teacher-student models are constructed, which are internal to the model and based on knowledge distillation. Fitting the teacher's model feature expression ability, further enhances the generalization of audio classification. Comparative experiments were conducted using the AudioSet dataset, ESC50 dataset, and VGGSound dataset. The results showed that the algorithm proposed in this paper has a 0.5% to 1.3% improvement in recognition accuracy compared to the optimal method based on audio single mode.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] An Adversarial Feature Distillation Method for Audio Classification
    Gao, Liang
    Mi, Haibo
    Zhu, Boqing
    Feng, Dawei
    Li, Yicong
    Peng, Yuxing
    IEEE ACCESS, 2019, 7 : 105319 - 105330
  • [32] AVForensics: Audio-driven Deepfake Video Detection with Masking Strategy in Self-supervision
    Zhu Yizhe
    Gao Jialin
    Zhou Xi
    PROCEEDINGS OF THE 2023 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2023, 2023, : 162 - 171
  • [33] TRICYCLE: AUDIO REPRESENTATION LEARNING FROM SENSOR NETWORK DATA USING SELF-SUPERVISION
    Cartwright, Mark
    Cramer, Jason
    Salamon, Justin
    Bello, Juan Pablo
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 278 - 282
  • [34] Self-Supervision and Weak Supervision for Accurate and Interpretable Chest X-Ray Classification Models
    Talasila, Abhiroop
    Karthikeyan, Akshaya
    Alle, Shanmukh
    Maity, Maitreya
    Priyakumar, U. Deva
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [35] SKGCR: self-supervision enhanced knowledge-aware graph collaborative recommendation
    Liu, Xiangkun
    Yang, Bo
    Xu, Jingyu
    APPLIED INTELLIGENCE, 2023, 53 (17) : 19872 - 19891
  • [36] Improving Air Quality Prediction via Self-Supervision Masked Air Modeling
    Chen, Shuang
    He, Li
    Shen, Shinan
    Zhang, Yan
    Ma, Weichun
    ATMOSPHERE, 2024, 15 (07)
  • [37] Audio-based anomaly detection on edge devices via self-supervision and spectral analysis
    Fabrizio Lo Scudo
    Ettore Ritacco
    Luciano Caroprese
    Giuseppe Manco
    Journal of Intelligent Information Systems, 2023, 61 : 765 - 793
  • [38] Learning dual disentangled representation with self-supervision for temporal knowledge graph reasoning
    Xiao, Yao
    Zhou, Guangyou
    Xie, Zhiwen
    Liu, Jin
    Huang, Jimmy Xiangji
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (03)
  • [39] SKGCR: self-supervision enhanced knowledge-aware graph collaborative recommendation
    Xiangkun Liu
    Bo Yang
    Jingyu Xu
    Applied Intelligence, 2023, 53 : 19872 - 19891
  • [40] Audio-based anomaly detection on edge devices via self-supervision and spectral analysis
    Lo Scudo, Fabrizio
    Ritacco, Ettore
    Caroprese, Luciano
    Manco, Giuseppe
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 61 (03) : 765 - 793