Acoustic scene classification with multi-temporal complex modulation spectrogram features and a convolutional LSTM network

被引:0
|
作者
Sayeh Mirzaei
Iman Khani Jazani
机构
[1] University of Tehran,School of Engineering Science, College of Engineering
[2] Amirkabir University of Technology,Faculty of Computer Engineering
来源
关键词
Acoustic scene classification; Convolutional neural network (CNN); Long short term memory (LSTM); Conv-LSTM; Modulation spectrogram;
D O I
暂无
中图分类号
学科分类号
摘要
Acoustic scene classification (ASC) is a mapping from an environmental sound recording to predefined classes representing the auditory scene of the recording. This paper proposes an ASC solution based on the combination of convolutional neural networks, long short term memory cells, and multi-temporal input encoding. The major novelty of the work is applying complex modulation spectrogram for feature extraction. We evaluate the complex modulation spectrogram as discriminant features, resulting in a 4.7% improvement in comparison with the commonly used Mel spectrogram. These features are computed for individual temporal segments of the audio recording to acquire a representation containing both spectral and temporal structure. Also, we derive a de-noising method which has not been used for ASC before but was beneficial in other speech processing tasks. This method leads to 1.5% improvement in prediction accuracy in comparison with a model without de-noising. The proposed model outperforms the state of the art methods by 7.5% in terms of the prediction accuracy for evaluation data in ASC on the DCASE 2017 dataset.
引用
收藏
页码:16395 / 16408
页数:13
相关论文
共 50 条
  • [1] Acoustic scene classification with multi-temporal complex modulation spectrogram features and a convolutional LSTM network
    Mirzaei, Sayeh
    Jazani, Iman Khani
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (11) : 16395 - 16408
  • [2] Environmental Sound Classification Based on Multi-temporal Resolution Convolutional Neural Network Combining with Multi-level Features
    Zhu, Boqing
    Xu, Kele
    Wang, Dezhi
    Zhang, Lilun
    Li, Bo
    Peng, Yuxing
    [J]. ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2018, PT II, 2018, 11165 : 528 - 537
  • [3] A Convolutional Neural Network Approach for Acoustic Scene Classification
    Valenti, Michele
    Squartini, Stefano
    Diment, Aleksandr
    Parascandolo, Giambattista
    Virtanen, Tuomas
    [J]. 2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, : 1547 - 1554
  • [4] Fusion and classification of multi-temporal SAR and optical imagery using convolutional neural network
    Shakya, Achala
    Biswas, Mantosh
    Pal, Mahesh
    [J]. INTERNATIONAL JOURNAL OF IMAGE AND DATA FUSION, 2022, 13 (02) : 113 - 135
  • [5] Multi-Temporal Scene Classification and Scene Change Detection With Correlation Based Fusion
    Ru, Lixiang
    Du, Bo
    Wu, Chen
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 1382 - 1394
  • [6] Convolutional Neural Network with Multi-Task Learning Scheme for Acoustic Scene Classification
    Tin Lay Nwe
    Tran Huy Dat
    Ma, Bin
    [J]. 2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 1347 - 1350
  • [7] FULLY CONVOLUTIONAL NETWORKS FOR MULTI-TEMPORAL SAR IMAGE CLASSIFICATION
    Mullissa, Adugna G.
    Persello, Claudio
    Tolpekin, Valentyn
    [J]. IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 6635 - 6638
  • [8] Acoustic Scene Classification Based on Dense Convolutional Networks Incorporating Multi-channel Features
    Wang, Dezhi
    Zhang, Lilun
    Xu, Kele
    Wang, Yongxian
    [J]. 2018 3RD INTERNATIONAL CONFERENCE ON COMMUNICATION, IMAGE AND SIGNAL PROCESSING, 2019, 1169
  • [9] SUBSPECTRALNET - USING SUB-SPECTROGRAM BASED CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION
    Phaye, Sai Samarth R.
    Benetos, Emmanouil
    Wang, Ye
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 825 - 829
  • [10] A Time Delay Convolutional Neural Network for Acoustic Scene Classification
    Lee, Younglo
    Park, Sangwook
    Ko, Hanseok
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2018,