CONTRASTIVE LEARNING OF GENERAL-PURPOSE AUDIO REPRESENTATIONS

被引:90
|
作者
Saeed, Aaqib [1 ]
Grangier, David [2 ]
Zeghidour, Neil [2 ]
机构
[1] Eindhoven Univ Technol, Eindhoven, Netherlands
[2] Google Res, Paris, France
关键词
self-supervised learning; audio; sound;
D O I
10.1109/ICASSP39728.2021.9413528
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We introduce COLA, a self-supervised pre-training approach for learning a general-purpose representation of audio. Our approach is based on contrastive learning: it learns a representation which assigns high similarity to audio segments extracted from the same recording while assigning lower similarity to segments from different recordings. We build on top of recent advances in contrastive learning for computer vision and reinforcement learning to design a lightweight, easy-to-implement self-supervised model of audio. We pre-train embeddings on the large-scale Audioset database and transfer these representations to 9 diverse classification tasks, including speech, music, animal sounds, and acoustic scenes. We show that despite its simplicity, our method significantly outperforms previous self-supervised systems. We furthermore conduct ablation studies to identify key design choices and release a library(1) to pre-train and fine-tune COLA models.
引用
收藏
页码:3875 / 3879
页数:5
相关论文
共 50 条
  • [1] Decorrelating Feature Spaces for Learning General-Purpose Audio Representations
    Ghosh, Sreyan
    Seth, Ashish
    Umesh, S.
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1402 - 1414
  • [2] BYOL for Audio: Exploring Pre-Trained General-Purpose Audio Representations
    Niizumi, Daisuke
    Takeuchi, Daiki
    Ohishi, Yasunori
    Harada, Noboru
    Kashino, Kunio
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 137 - 151
  • [3] BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
    Niizumi, Daisuke
    Takeuchi, Daiki
    Ohishi, Yasunori
    Harada, Noboru
    Kashino, Kunio
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
    Niizumi, Daisuke
    Takeuchi, Daiki
    Ohishi, Yasunori
    Harada, Noboru
    Kashino, Kunio
    [J]. HEAR: HOLISTIC EVALUATION OF AUDIO REPRESENTATIONS, VOL 166, 2021, 166 : 1 - 24
  • [5] Bioinspired framework for general-purpose learning
    de Toledo, SA
    Barreiro, JM
    [J]. FOUNDATIONS AND TOOLS FOR NEURAL MODELING, PROCEEDINGS, VOL I, 1999, 1606 : 507 - 516
  • [6] Implicitly perturbed Hamiltonian as a class of versatile and general-purpose molecular representations for machine learning
    Amin Alibakhshi
    Bernd Hartke
    [J]. Nature Communications, 13
  • [7] Implicitly perturbed Hamiltonian as a class of versatile and general-purpose molecular representations for machine learning
    Alibakhshi, Amin
    Hartke, Bernd
    [J]. NATURE COMMUNICATIONS, 2022, 13 (01)
  • [8] LEARNING ON VLSI - A GENERAL-PURPOSE DIGITAL NEUROCHIP
    DURANTON, M
    SIRAT, JA
    [J]. PHILIPS JOURNAL OF RESEARCH, 1990, 45 (01) : 1 - 17
  • [9] The Pixels and Sounds of Emotion: General-Purpose Representations of Arousal in Games
    Makantasis, Konstantinos
    Liapis, Antonios
    Yannakakis, Georgios N.
    [J]. IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2023, 14 (01) : 680 - 693
  • [10] Learning General-Purpose Representations for Cross-Domain Hyperspectral Images Classification with Small Samples
    Gao, Kuiliang
    Yu, Anzhu
    You, Xiong
    Qiu, Chunping
    Liu, Bing
    Guo, Wenyue
    [J]. REMOTE SENSING, 2023, 15 (04)