DNN-HMM based Automatic Speech Recognition for HRI Scenarios

被引:24
|
作者
Novoa, Jose [1 ]
Wuth, Jorge [1 ]
Pablo Escudero, Juan [1 ]
Fredes, Josue [1 ]
Mahu, Rodrigo [1 ]
Becerra Yoma, Nestor [1 ]
机构
[1] Univ Chile, Speech Proc & Transmiss Lab, Av Tupper 2007, Santiago, Chile
关键词
DNN-HMM; time-varying acoustic channel; speech recognition; CONVOLUTIONAL NEURAL-NETWORKS;
D O I
10.1145/3171221.3171280
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we propose to replace the classical black box integration of automatic speech recognition technology in HRI applications with the incorporation of the HRI environment representation and modeling, and the robot and user states and contexts. Accordingly, this paper focuses on the environment representation and modeling by training a deep neural network-hidden Markov model based automatic speech recognition engine combining clean utterances with the acoustic-channel responses and noise that were obtained from an HRI testbed built with a PR2 mobile manipulation robot. This method avoids recording a training database in all the possible acoustic environments given an HRI scenario. Moreover, different speech recognition testing conditions were produced by recording two types of acoustics sources, i.e. a loudspeaker and human speakers, using a Microsoft Kinect mounted on top of the PR2 robot, while performing head rotations and movements towards and away from the fixed sources. In this generic HRI scenario, the resulting automatic speech recognition engine provided a word error rate that is at least 26% and 38% lower than publicly available speech recognition APIs with the playback (i.e. loudspeaker) and human testing databases, respectively, with a limited amount of training data.
引用
收藏
页码:150 / 159
页数:10
相关论文
共 50 条
  • [1] Research on Speech Accurate Recognition Technology Based on Deep Learning DNN-HMM
    Xia Wanyu
    Qiu Wu
    Feng Xiancheng
    [J]. MIPPR 2019: PATTERN RECOGNITION AND COMPUTER VISION, 2020, 11430
  • [2] Mismatched Training Data Enhancement for Automatic Recognition of Children's Speech using DNN-HMM
    Qian, Mengjie
    McLoughlin, Ian
    Guo, Wu
    Dai, Lirong
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [3] Contaminated speech training methods for robust DNN-HMM distant speech recognition
    Ravanelli, Mirco
    Omologo, Maurizio
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 756 - 760
  • [4] Multilingual Approach to Joint Speech and Accent Recognition with DNN-HMM Framework
    Peng, Yizhou
    Zhang, Jicheng
    Zhang, Haobo
    Xu, Haihua
    Huang, Hao
    Li, Sheng
    Chng, Eng Siong
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1043 - 1048
  • [5] Comparison of syllable-based and phoneme-based DNN-HMM in Japanese Speech Recognition
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    [J]. 2014 INTERNATIONAL CONFERENCE OF ADVANCED INFORMATICS: CONCEPT, THEORY AND APPLICATION (ICAICTA), 2014, : 249 - 254
  • [6] Automatic Speech Recognition for Indoor HRI Scenarios
    Novoa, Jose
    Mahu, Rodrigo
    Wuth, Jorge
    Pablo Escudero, Juan
    Fredes, Josue
    Becerra Yoma, Nestor
    [J]. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 2021, 10 (02)
  • [7] Neural Speech-to-Text Language Models for Rescoring Hypotheses of DNN-HMM Hybrid Automatic Speech Recognition Systems
    Tanaka, Tomohiro
    Masumura, Ryo
    Moriya, Takafumi
    Aono, Yushi
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 196 - 200
  • [8] Comparison of DCT and Autoencoder-based Features for DNN-HMM Multimodal Silent Speech Recognition
    Liu, Licheng
    Ji, Yan
    Wang, Hongcui
    Denby, Bruce
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [9] A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Sharable Knowledge
    Tanaka, Tomohiro
    Masumura, Ryo
    Moriya, Takafumi
    Oba, Takanobu
    Aono, Yushi
    [J]. INTERSPEECH 2019, 2019, : 2210 - 2214
  • [10] Labeling Unsegmented Sequence Data with DNN-HMM and Its Application for Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 10 - 14