Unsupervised Learning of Acoustic Unit Descriptors for Audio Content Representation and Classification

被引:0
|
作者
Chaudhuri, Sourish [1 ]
Harvilla, Mark [1 ]
Raj, Bhiksha [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
关键词
audio representation; sound alphabet; unsupervised sound units;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we attempt to represent audio as a sequence of acoustic units using unsupervised learning and use them for multi-class classification. We expect the acoustic units to represent sounds or sound sequences to automatically create a sound alphabet. We use audio from multi-class Youtube-quality multimedia data to converge on a set of sound units, such that each audio file is represented as a sequence of these units. We then try to learn category language models over sequences of the acoustic units, and use them to generate acoustic and language model scores for each category. Finally, we use a margin based classification algorithm to weight the category scores to predict the class that each test data point belongs to. We compare different settings and report encouraging results on this task.
引用
收藏
页码:2276 / 2279
页数:4
相关论文
共 50 条
  • [1] Audio Event-Relational Graph Representation Learning for Acoustic Scene Classification
    Hou, Yuanbo
    Song, Siyang
    Yu, Chuang
    Wang, Wenwu
    Botteldooren, Dick
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1382 - 1386
  • [2] UNSUPERVISED DISCRIMINATIVE LEARNING OF SOUNDS FOR AUDIO EVENT CLASSIFICATION
    Hornauer, Sascha
    Li, Ke
    Yu, Stella X.
    Ghaffarzadegan, Shabnam
    Ren, Liu
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3035 - 3039
  • [3] Unsupervised Representation Learning for Pulmonary Nodule Classification
    Jin, Xinyu
    Zhu, Fenghao
    Li, Lanjuan
    Xia, Qi
    [J]. 2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID), VOL. 1, 2017, : 362 - 365
  • [4] Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders
    Chen, Mingjie
    Hain, Thomas
    [J]. INTERSPEECH 2020, 2020, : 4866 - 4870
  • [5] Supervised Representation Learning for Audio Scene Classification
    Rakotomamonjy, Alain
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1253 - 1265
  • [6] Dominant Audio Descriptors For Audio Classification and Retrieval
    Fadeev, Aleksey
    Missaoui, Oualid
    Frigui, Hichem
    [J]. EIGHTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2009, : 75 - 78
  • [7] Unsupervised Learning of Disentangled Speech Content and Style Representation
    Tjandra, Andros
    Pang, Ruoming
    Zhang, Yu
    Karita, Shigeki
    [J]. INTERSPEECH 2021, 2021, : 4089 - 4093
  • [8] Unsupervised SAR Representation Learning Improves Classification Performance
    Vaughn, Nolan
    Sullivan, Bo
    Jaskie, Kristen
    [J]. AUTOMATIC TARGET RECOGNITION XXXIV, 2024, 13039
  • [9] Remote Sensing Scene Classification by Unsupervised Representation Learning
    Lu, Xiaoqiang
    Zheng, Xiangtao
    Yuan, Yuan
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (09): : 5148 - 5157
  • [10] Learning audio sequence representations for acoustic event classification
    Zhang, Zixing
    Liu, Ding
    Han, Jing
    Qian, Kun
    Schuller, Bjorn W.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 178