Improved Bottleneck Features Using Pretrained Deep Neural Networks

被引:0
|
作者
Yu, Dong
Seltzer, Michael L.
机构
关键词
bottleneck features; pretraining; deep neural network; deep belief network; NECK FEATURES; LVCSR;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bottleneck features have been shown to be effective, in improving the accuracy of automatic speech recognition (ASR) systems. Conventionally, bottleneck features are extracted from a multi-layer perceptron (MLP) trained to predict context-independent monophone states. The MLP typically has three hidden layers and is trained using the backpropagation algorithm. In this paper, we propose two improvements to the training of bottleneck features motivated by recent advances in the use of deep neural networks (DNNs) for speech recognition. First, we show how the use of unsupervised pretraining of a DNN enhances the network's discriminative power and improves the bottleneck features it generates. Second, we show that a neural network trained to predict context-dependent senone targets produces better bottleneck features than one trained to predict monophone states. Bottleneck features trained using the proposed methods produced a 16% relative reduction in sentence error rate over conventional bottleneck features on a large vocabulary business search task.
引用
收藏
页码:244 / 247
页数:4
相关论文
共 50 条
  • [1] Efficient deep neural networks for speech synthesis using bottleneck features
    Joo, Young-Sun
    Jun, Won-Suk
    Kang, Hong-Goo
    [J]. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2016,
  • [2] Speech Bandwidth Extension Using Bottleneck Features and Deep Recurrent Neural Networks
    Gu, Yu
    Ling, Zhen-Hua
    Dai, Li-Rong
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 297 - 301
  • [3] Emotion Recognition Using Pretrained Deep Neural Networks
    Dobes, Marek
    Sabolova, Natalia
    [J]. ACTA POLYTECHNICA HUNGARICA, 2023, 20 (04) : 195 - 204
  • [4] SPEAKER ADAPTIVE TRAINING IN DEEP NEURAL NETWORKS USING SPEAKER DEPENDENT BOTTLENECK FEATURES
    Doddipatla, Rama
    [J]. 2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5290 - 5294
  • [5] Evaluation of bottom-up saliency model using deep features pretrained by deep convolutional neural networks
    Mahdi, Ali
    Qin, Jun
    [J]. JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (03)
  • [6] Detection of Diabetic Retinopathy Using Pretrained Deep Neural Networks
    Kajan, Slavomir
    Goga, Jozef
    Lacko, Kristian
    Pavlovicova, Jarmila
    [J]. PROCEEDINGS OF THE 2020 30TH INTERNATIONAL CONFERENCE CYBERNETICS & INFORMATICS (K&I '20), 2020,
  • [7] Investigation of Bottleneck Features and Multilingual Deep Neural Networks for Speaker Verification
    Tian, Yao
    Cai, Meng
    He, Liang
    Liu, Jia
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1151 - 1155
  • [8] AUTO-ENCODER BOTTLENECK FEATURES USING DEEP BELIEF NETWORKS
    Sainath, Tara N.
    Kingsbury, Brian
    Ramabhadran, Bhuvana
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4153 - 4156
  • [9] Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features
    Wu, Zhizheng
    King, Simon
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 309 - 313
  • [10] The Effectiveness of Using a Pretrained Deep Learning Neural Networks for Object Classification in Underwater Video
    Szymak, Piotr
    Piskur, Pawel
    Naus, Krzysztof
    [J]. REMOTE SENSING, 2020, 12 (18)