Acoustic Modeling with Deep Neural Networks Using Raw Time Signal for LVCSR

被引：0

作者：

Tueske, Zoltan ^{[1
]}

Golik, Pavel ^{[1
]}

Schluter, Ralf ^{[1
]}

Ney, Hermann ^{[1
,2
]}

机构：

[1] Rhein Westfal TH Aachen, Dept Comp Sci, Human Language Technol & Pattern Recognit, D-52056 Aachen, Germany

[2] LIMSI CNRS, Spoken Language Proc Grp, Paris, France

来源：

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年

关键词：

acoustic modeling; raw signal; neural networks;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we investigate how much feature extraction is required by a deep neural network (DNN) based acoustic model for automatic speech recognition (ASR). We decompose the feature extraction pipeline of a state-of-the-art ASR system step by step and evaluate acoustic models trained on standard MFCC features, critical band energies (CRBE), FFT magnitude spectrum and even on the raw time signal. The focus is put on raw time signal as input features, i.e. as much as zero feature extraction prior to DNN training. Noteworthy, the gap in recognition accuracy between MFCC and raw time signal decreases strongly once we switch from sigmoid activation function to rectified linear units, offering a real alternative to standard signal processing. The analysis of the first layer weights reveals that the DNN can discover multiple band pass filters in time domain. Therefore we try to improve the raw time signal based system by initializing the first hidden layer weights with impulse responses of an audiologically motivated filter bank. Inspired by the multi-resolutional analysis layer learned automatically from raw time signal input, we train the DNN on a combination of multiple short-term features. This illustrates how the DNN can learn from the little differences between MFCC, PLP and Gammatone features, suggesting that it is useful to present the DNN with different views on the underlying audio.

引用

页码：890 / 894

页数：5

共 50 条

[31] Multi-timescale Feature-extraction Architecture of Deep Neural Networks for Acoustic Model Training from Raw Speech Signal
Takeda, Ryu
Nakadai, Kazuhiro
Komatani, Kazunori
[J]. 2018 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2018, : 2503 - 2510
[32] Robust acoustic event classification using deep neural networks
Sharan, Roneel V.
Moir, Tom J.
[J]. INFORMATION SCIENCES, 2017, 396 : 24 - 32
[33] MULTILINGUAL ACOUSTIC MODELS USING DISTRIBUTED DEEP NEURAL NETWORKS
Heigold, G.
Vanhoucke, V.
Senior, A.
Nguyen, P.
Ranzato, M.
Devin, M.
Dean, J.
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8619 - 8623
[34] Analysis of Gastrointestinal Acoustic Activity Using Deep Neural Networks
Ficek, Jakub
Radzikowski, Kacper
Nowak, Jan Krzysztof
Yoshie, Osamu
Walkowiak, Jaroslaw
Nowak, Robert
[J]. SENSORS, 2021, 21 (22)
[35] A greenhouse modeling and control using deep neural networks
Salah, Latifa Belhaj
Fourati, Fathi
[J]. APPLIED ARTIFICIAL INTELLIGENCE, 2021, 35 (15) : 1905 - 1929
[36] Intersensory Causality Modeling using Deep Neural Networks
Noda, Kuniaki
Arie, Hiroaki
Suga, Yuki
Ogata, Tetsuya
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 1995 - 2000
[37] Reverberation robust acoustic modeling using i-vectors with time delay neural networks
Peddinti, Vijayaditya
Chen, Guoguo
Povey, Daniel
Khudanpur, Sanjeev
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2440 - 2444
[38] Modeling time series data with deep Fourier neural networks
Gashler, Michael S.
Ashmore, Stephen C.
[J]. NEUROCOMPUTING, 2016, 188 : 3 - 11
[39] RF Signal Transformation and Classification using Deep Neural Networks
Khalid, Umar
Karim, Nazmul
Rahnavard, Nazanin
[J]. BIG DATA IV: LEARNING, ANALYTICS, AND APPLICATIONS, 2022, 12097
[40] Seismic Signal Denoising and Decomposition Using Deep Neural Networks
Zhu, Weiqiang
Mousavi, S. Mostafa
Beroza, Gregory C.
[J]. IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (11): : 9476 - 9488

← 1 2 3 4 5 →