Acoustic Modeling with Deep Neural Networks Using Raw Time Signal for LVCSR

被引:0
|
作者
Tueske, Zoltan [1 ]
Golik, Pavel [1 ]
Schluter, Ralf [1 ]
Ney, Hermann [1 ,2 ]
机构
[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, D-52056 Aachen, Germany
[2] LIMSI CNRS, Spoken Language Proc Grp, Paris, France
关键词
acoustic modeling; raw signal; neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we investigate how much feature extraction is required by a deep neural network (DNN) based acoustic model for automatic speech recognition (ASR). We decompose the feature extraction pipeline of a state-of-the-art ASR system step by step and evaluate acoustic models trained on standard MFCC features, critical band energies (CRBE), FFT magnitude spectrum and even on the raw time signal. The focus is put on raw time signal as input features, i.e. as much as zero feature extraction prior to DNN training. Noteworthy, the gap in recognition accuracy between MFCC and raw time signal decreases strongly once we switch from sigmoid activation function to rectified linear units, offering a real alternative to standard signal processing. The analysis of the first layer weights reveals that the DNN can discover multiple band pass filters in time domain. Therefore we try to improve the raw time signal based system by initializing the first hidden layer weights with impulse responses of an audiologically motivated filter bank. Inspired by the multi-resolutional analysis layer learned automatically from raw time signal input, we train the DNN on a combination of multiple short-term features. This illustrates how the DNN can learn from the little differences between MFCC, PLP and Gammatone features, suggesting that it is useful to present the DNN with different views on the underlying audio.
引用
收藏
页码:890 / 894
页数:5
相关论文
共 50 条
  • [31] Multi-timescale Feature-extraction Architecture of Deep Neural Networks for Acoustic Model Training from Raw Speech Signal
    Takeda, Ryu
    Nakadai, Kazuhiro
    Komatani, Kazunori
    [J]. 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 2503 - 2510
  • [32] Robust acoustic event classification using deep neural networks
    Sharan, Roneel V.
    Moir, Tom J.
    [J]. INFORMATION SCIENCES, 2017, 396 : 24 - 32
  • [33] MULTILINGUAL ACOUSTIC MODELS USING DISTRIBUTED DEEP NEURAL NETWORKS
    Heigold, G.
    Vanhoucke, V.
    Senior, A.
    Nguyen, P.
    Ranzato, M.
    Devin, M.
    Dean, J.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8619 - 8623
  • [34] Analysis of Gastrointestinal Acoustic Activity Using Deep Neural Networks
    Ficek, Jakub
    Radzikowski, Kacper
    Nowak, Jan Krzysztof
    Yoshie, Osamu
    Walkowiak, Jaroslaw
    Nowak, Robert
    [J]. SENSORS, 2021, 21 (22)
  • [35] A greenhouse modeling and control using deep neural networks
    Salah, Latifa Belhaj
    Fourati, Fathi
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (15) : 1905 - 1929
  • [36] Intersensory Causality Modeling using Deep Neural Networks
    Noda, Kuniaki
    Arie, Hiroaki
    Suga, Yuki
    Ogata, Tetsuya
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 1995 - 2000
  • [37] Reverberation robust acoustic modeling using i-vectors with time delay neural networks
    Peddinti, Vijayaditya
    Chen, Guoguo
    Povey, Daniel
    Khudanpur, Sanjeev
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2440 - 2444
  • [38] Modeling time series data with deep Fourier neural networks
    Gashler, Michael S.
    Ashmore, Stephen C.
    [J]. NEUROCOMPUTING, 2016, 188 : 3 - 11
  • [39] RF Signal Transformation and Classification using Deep Neural Networks
    Khalid, Umar
    Karim, Nazmul
    Rahnavard, Nazanin
    [J]. BIG DATA IV: LEARNING, ANALYTICS, AND APPLICATIONS, 2022, 12097
  • [40] Seismic Signal Denoising and Decomposition Using Deep Neural Networks
    Zhu, Weiqiang
    Mousavi, S. Mostafa
    Beroza, Gregory C.
    [J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (11): : 9476 - 9488