ACOUSTIC SCENE CLASSIFICATION WITH MISMATCHED RECORDING DEVICES USING MIXTURE OF EXPERTS LAYER

被引:11
|
作者
Truc Nguyen [1 ]
Pernkopf, Franz [1 ]
机构
[1] Graz Univ Technol, Signal Proc & Speech Commun Lab, Inffeldgasse 16c, A-8010 Graz, Austria
来源
2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) | 2019年
基金
奥地利科学基金会;
关键词
Acoustic scene classification; convolutional neural network; mixture of experts layer; mixture of softmaxes;
D O I
10.1109/ICME.2019.00287
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Recently, a mismatch in acoustic conditions such as a temporal recording gap as well as different recording devices for the development and the evaluation data has been considered in Acoustic Scene Classification (ASC). This brings ASC closer to real world conditions. In this paper, we address ASC with mismatching recording devices. This has been introduced as task 1B of the DCASE 2018 challenge. We proposed a flexible and robust model that uses a mixture of experts (MoE) layer replacing the fully connected dense layer such that each expert can adapt to the specific domains of the data. Furthermore, we observe different Convolutional Neural Network (CNN) models as well as the number of the experts of the MoE dense layer using log-mel features. In addition, we perform mixup data augmentation to enhance the robustness of our models. In experiments, the classification performance is 66.1% using 15 experts in the MoE dense layer with approximately 2M parameters. This outperforms the best model of task 1B of the DCASE 2018 challenge by 2.5% (absolute). This model uses an ensemble selection of 12 individual models with similar to 12M parameters.
引用
收藏
页码:1666 / 1671
页数:6
相关论文
共 50 条
  • [31] Acoustic Scene Classification using Kervolution-Based SubSpectralNet
    Nandi, Ritika
    Shekhar, Shashank
    Mulimani, Manjunath
    INTERSPEECH 2021, 2021, : 561 - 565
  • [32] Deep Semantic Encoder-Decoder Network for Acoustic Scene Classification with Multiple Devices
    Ma, Xinxin
    Shao, Yunfei
    Ma, Yong
    Zhang, Wei-Qiang
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 365 - 370
  • [33] A Multi-Accent Acoustic Model using Mixture of Experts for Speech Recognition
    Jain, Abhinav
    Singh, Vishwanath P.
    Rath, Shakti P.
    INTERSPEECH 2019, 2019, : 779 - 783
  • [34] Capturing Discriminative Information Using a Deep Architecture in Acoustic Scene Classification
    Shim, Hye-jin
    Jung, Jee-weon
    Kim, Ju-ho
    Yu, Ha-jin
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [35] Acoustic Scene Classification using Asynchronous Multichannel Observations with Different Lengths
    Imoto, Keisuke
    Ono, Nobutaka
    2017 IEEE 19TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2017,
  • [36] Acoustic Scene Classification Using Deep Audio Feature and BLSTM Network
    Li, Yanxiong
    Li, Xianku
    Zhang, Yuhan
    Wang, Wucheng
    Liu, Mingle
    Feng, Xiaohui
    2018 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2018, : 371 - 374
  • [37] Acoustic Scene Classification Using Multichannel Observation with Partially Missing Channels
    Imoto, Keisuke
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 875 - 879
  • [38] ACOUSTIC SCENE CLASSIFICATION USING HIGHER-ORDER AMBISONIC FEATURES
    Green, Marc C.
    Adavanne, Sharath
    Murphy, Damian
    Virtanen, Tuomas
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 328 - 332
  • [39] Scene Classification for Weak Devices Using Spatial Oriented Gradient Indexing
    Phung Minh Tung
    Tu Trung Hieu
    EIGHTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2016), 2017, 10225
  • [40] Multi-level distance embedding learning for robust acoustic scene classification with unseen devices
    Jiang, Gang
    Ma, Zhongchen
    Mao, Qirong
    Zhang, Jianming
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (03) : 1089 - 1099