Enhancing Zero-shot Audio Classification using Sound Attribute Knowledge from Large Language Models

被引:0
|
作者
Xu, Xuenan [1 ]
Zhang, Pingyue [1 ]
Yang, Ming [2 ]
Zhang, Ji [2 ]
Wu, Mengyue [1 ]
机构
[1] Shanghai Jiao Tong Univ, MoE Key Lab Artificial Intelligence, X LANCE Lab, Shanghai, Peoples R China
[2] Alibaba Grp, Inst Intelligent Comp, Hangzhou, Peoples R China
来源
基金
中国国家自然科学基金;
关键词
zero-shot learning; audio classification; sound attribute; large language model; audio-text contrastive learning;
D O I
10.21437/Interspeech.2024-1692
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Zero-shot audio classification aims to recognize and classify a sound class that the model has never seen during training. This paper presents a novel approach for zero-shot audio classification using automatically generated sound attribute descriptions. We propose a list of sound attributes and leverage large language model's domain knowledge to generate detailed attribute descriptions for each class. In contrast to previous works that primarily relied on class labels or simple descriptions, our method focuses on multi-dimensional innate auditory attributes, capturing different characteristics of sound classes. Additionally, we incorporate a contrastive learning approach to enhance zero-shot learning from textual labels. We validate the effectiveness of our method on VGGSound and AudioSet(1). Our results demonstrate a substantial improvement in zero-shot classification accuracy. Ablation results show robust performance enhancement, regardless of the model architecture.
引用
收藏
页码:4808 / 4812
页数:5
相关论文
共 50 条
  • [31] ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models
    Mekala, Dheeraj
    Wolfe, Jason
    Roy, Subhro
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5792 - 5799
  • [32] Large Language Models as Zero-Shot Human Models for Human-Robot Interaction
    Zhang, Bowen
    Soh, Harold
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, : 7961 - 7968
  • [33] Vision-Language Models for Zero-Shot Classification of Remote Sensing Images
    Al Rahhal, Mohamad Mahmoud
    Bazi, Yakoub
    Elgibreen, Hebah
    Zuair, Mansour
    APPLIED SCIENCES-BASEL, 2023, 13 (22):
  • [34] Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents
    Huang, Wenlong
    Abbeel, Pieter
    Pathak, Deepak
    Mordatch, Igor
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [35] CDZL: a controllable diversity zero-shot image caption model using large language models
    Zhao, Xin
    Kong, Weiwei
    Liu, Zongyao
    Wang, Menghao
    Li, Yiwen
    SIGNAL IMAGE AND VIDEO PROCESSING, 2025, 19 (04)
  • [36] A Large-scale Attribute Dataset for Zero-shot Learning
    Zhao, Bo
    Fu, Yanwei
    Liang, Rui
    Wu, Jiahong
    Wang, Yonggang
    Wang, Yizhou
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 398 - 407
  • [37] PqE: Zero-Shot Document Expansion for Dense Retrieval with Large Language Models
    Liu, Jiyuan
    Zou, Dongsheng
    Chai, Naiquan
    Yang, Yuming
    Wang, Hao
    Song, Xinyi
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, NLPCC 2024, 2025, 15359 : 97 - 109
  • [38] Zero-Shot Generative Large Language Models for Systematic Review Screening Automation
    Wang, Shuai
    Scells, Harrisen
    Zhuang, Shengyao
    Potthast, Martin
    Koopman, Bevan
    Zuccon, Guido
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 403 - 420
  • [39] Improving Zero-Shot Text Matching for Financial Auditing with Large Language Models
    Hillebrand, Lars
    Berger, Armin
    Deusser, Tobias
    Dilmaghani, Tim
    Khaled, Mohamed
    Kliem, Bernd
    Loitz, Ruediger
    Pielka, Maren
    Leonhard, David
    Bauckhage, Christian
    Sifa, Rafet
    PROCEEDINGS OF THE 2023 ACM SYMPOSIUM ON DOCUMENT ENGINEERING, DOCENG 2023, 2023,
  • [40] Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL
    Fan, Ju
    Gu, Zihui
    Zhang, Songyue
    Zhang, Yuxin
    Chen, Zui
    Cao, Lei
    Li, Guoliang
    Madden, Samuel
    Du, Xiaoyong
    Tang, Nan
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (11): : 2750 - 2763