Audio Event Recognition Based On DBN Features From Multiple Filter-bank Representations

被引:0
|
作者
Guo, Feng [1 ]
Chen, Xiaoou [1 ]
Yang, Deshun [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, 128 Zhongguancun North St, Beijing 100871, Peoples R China
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In the audio event classification or detection research field, the representation of the audio itself is important. Many researchers tried to apply Deep Belief Network (DBN) to learn new representations of the audio. The mel filter-bank feature, which is obtained based on mel scale, is commonly used as the low level representation of the audio in the pre-processing procedure of DBN. However, the mel bands used in mel filter-bank feature may not be sufficient for the comprehensive representation of the diverse audio events in the real world and then it will make it difficult for DBN to learn good audio features. In this paper, two steps are taken to explore and tackle the problem. In the first step, we conduct a comparison of the effects among different arrangements of frequency bands to DBN feature learning in the audio event recognition. Here the arrangements of frequency bands include mel bands, bark bands, linear bands and pyramid bands. In the second step, in order to utilize the different classification capabilities of the DBN features on different audio events, we adopt the Adaboost algorithm to fuse them. We conduct the experiments on real datasets collected from findsound website, and the results verifies that our proposed audio event classification system, which uses diverse features selected by Adaboost from all sets of DBN features, outperforms the one using only one kind of DBN feature set.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] Speech recognition using filter-bank features
    Ravindran, S
    Demiroglu, C
    Anderson, DV
    [J]. CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 1900 - 1903
  • [2] Generalized Filter-bank Features for Robust Speech Recognition Against Reverberation
    Pardede, Hilman F.
    Zilvan, Vicky
    Krisnandi, Dikdik
    Heryana, Ana
    Kusumo, R. Budiarianto S.
    [J]. 2019 INTERNATIONAL CONFERENCE ON COMPUTER, CONTROL, INFORMATICS AND ITS APPLICATIONS (IC3INA), 2019, : 19 - 24
  • [3] Optimization of filter-bank to improve the extraction of MFCC features in speech recognition
    Hung, JW
    [J]. PROCEEDINGS OF THE 2004 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2004, : 675 - 678
  • [4] Filtering of filter-bank energies for robust speech recognition
    Jung, HY
    [J]. ETRI JOURNAL, 2004, 26 (03) : 273 - 276
  • [5] Bilinear map of filter-bank outputs for DNN-based speech recognition
    Ogawa, Tetsuji
    Ueda, Kenshiro
    Katsurada, Kouichi
    Kobayashi, Tetsunori
    Nitta, Tsuneo
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 16 - 20
  • [6] Optimal filter-bank design for multiple texture discrimination
    Randen, T
    Husoy, JH
    [J]. INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL II, 1997, : 215 - 218
  • [7] INSTANTANEOUS FREQUENCY FILTER-BANK FEATURES FOR LOW RESOURCE SPEECH RECOGNITION USING DEEP RECURRENT ARCHITECTURES
    Nayak, Shekhar
    Kumar, C. Shiva
    Murty, K. Sri Rama
    [J]. 2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2021, : 105 - 110
  • [8] Evaluation of a feature selection scheme on ICA-based filter-bank for speech recognition
    Faraji, Neda
    Ahadi, S. M.
    [J]. 2007 6TH INTERNATIONAL CONFERENCE ON INFORMATION, COMMUNICATIONS & SIGNAL PROCESSING, VOLS 1-4, 2007, : 1277 - 1281
  • [9] Adaptive Wavelet Packet Filter-Bank Based Acoustic Feature for Speech Emotion Recognition
    Li, Yue
    Zhang, Guobao
    Huang, Yongming
    [J]. PROCEEDINGS OF 2013 CHINESE INTELLIGENT AUTOMATION CONFERENCE: INTELLIGENT INFORMATION PROCESSING, 2013, 256 : 359 - 366
  • [10] Filter bank Based Cepstral Features for Speaker Recognition
    Chougule, Sharada V.
    Chavan, Mahesh S.
    Gaikwad, M. S.
    [J]. 2014 IEEE GLOBAL CONFERENCE ON WIRELESS COMPUTING AND NETWORKING (GCWCN), 2014, : 102 - 106