Augmented Strategy For Polyphonic Sound Event Detection

被引:0
|
作者
Wang, Bolun [1 ]
Fu, Zhong-Hua [1 ,2 ]
Wu, Hao [1 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian, Peoples R China
[2] Xian IFLYTEK Hyper Brain Informat Technol Co Ltd, Xian, Peoples R China
关键词
Sound event detection; Data augmentation; Model fusion; ACOUSTIC SCENES; CLASSIFICATION;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Sound event detection is an important issue for many applications like audio content retrieval, intelligent monitoring, and scene-based interaction. The traditional studies on this topic are mainly focusing on identification of single sound event class. However, in real applications, several sound events usually happen concurrently and with different durations. That leads to a new detection task on polyphonic sound event classification along with event time boundaries. In this paper, we propose an augmented strategy for this task, which faces challenges of a large amount of unbalanced and weakly labelled training data. Specifically, the strategy includes data augmentation to enrich training set to eliminate data unbalance, a new loss function that combines cross entropy and F-score, and model fusion to integrate the powers of different classifiers. The performance of the strategy is validated on DCASE2019 dataset, and both the event and segment detections are significantly improved over the baseline system.
引用
收藏
页码:1496 / 1500
页数:5
相关论文
共 50 条
  • [41] CONTRASTIVE LOSS BASED FRAME-WISE FEATURE DISENTANGLEMENT FOR POLYPHONIC SOUND EVENT DETECTION
    Guan, Yadong
    Han, Jiqing
    Song, Hongwei
    Song, Wenjie
    Zheng, Guibin
    Zheng, Tieran
    He, Yongjun
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1021 - 1025
  • [42] Polyphonic Sound Event Tracking Using Linear Dynamical Systems
    Benetos, Emmanouil
    Lafay, Gregoire
    Lagrange, Mathieu
    Plumbley, Mark D.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (06) : 1266 - 1277
  • [43] How Robust are Audio Embeddings for Polyphonic Sound Event Tagging?
    Abesser, Jakob
    Grollmisch, Sascha
    Mueller, Meinard
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2658 - 2667
  • [44] SALSA-LITE: A FAST AND EFFECTIVE FEATURE FOR POLYPHONIC SOUND EVENT LOCALIZATION AND DETECTION WITH MICROPHONE ARRAYS
    Thi Ngoc Tho Nguyen
    Jones, Douglas L.
    Watcharasupat, Karn N.
    Huy Phan
    Gan, Woon-Seng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 716 - 720
  • [45] A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification
    Wang, Yun
    Metze, Florian
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 3097 - 3101
  • [46] Teacher-Student Framework for Polyphonic Semi-supervised Sound Event Detection: Survey and Empirical Analysis
    Diffallah, Zhor
    Ykhlef, Hadjer
    Bouarfa, Hafida
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2024, 15 (05)
  • [47] POLYPHONIC SOUND EVENT DETECTION USING CONVOLUTIONAL BIDIRECTIONAL LSTM AND SYNTHETIC DATA-BASED TRANSFER LEARNING
    Jung, Seokwon
    Park, Jungbae
    Lee, Sangwan
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 885 - 889
  • [48] A Method Based on Dual Cross-Modal Attention and Parameter Sharing for Polyphonic Sound Event Localization and Detection
    Lee, Sang-Hoon
    Hwang, Jung-Wook
    Song, Min-Hwan
    Park, Hyung-Min
    APPLIED SCIENCES-BASEL, 2022, 12 (10):
  • [49] Polyphonic Sound Event Detection Based on Residual Convolutional Recurrent Neural Network With Semi-Supervised Loss Function
    Kim, Nam Kyun
    Kim, Hong Kook
    IEEE ACCESS, 2021, 9 (09): : 7564 - 7575
  • [50] Drum Sound Detection in Polyphonic Music with Hidden Markov Models
    Paulus, Jouni
    Klapuri, Anssi
    EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2009,