Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals

被引:60
|
作者
Taniguchi, Tadahiro [1 ]
Nakashima, Ryo [2 ]
Liu, Hailong [2 ]
Nagasaka, Shogo [2 ]
机构
[1] Ritsumeikan Univ, Coll Informat Sci & Engn, Kusatsu, Japan
[2] Ritsumeikan Univ, Grad Sch Informat Sci & Engn, Kusatsu, Japan
关键词
Bayesian nonparametrics; deep learning; speech recognition; unsupervised learning; word discovery; DRIVING BEHAVIOR; SEGMENTATION; ROBOTICS; MODEL;
D O I
10.1080/01691864.2016.1159981
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Direct word discovery from audio speech signals is a very difficult and challenging problem for a developmental robot. Human infants are able to discover words directly from speech signals, and, to understand human infants' developmental capability using a constructive approach, it is very important to build a machine learning system that can acquire knowledge about words and phonemes, i.e. a language model and an acoustic model, autonomously in an unsupervised manner. To achieve this, the nonparametric Bayesian double articulation analyzer (NPB-DAA) with the deep sparse autoencoder (DSAE) is proposed in this paper. The NPB-DAA has been proposed to achieve totally unsupervised direct word discovery from speech signals. However, the performance was still unsatisfactory, although it outperformed pre-existing unsupervised learning methods. In this paper, we integrate the NPB-DAA with the DSAE, which is a neural network model that can be trained in an unsupervised manner, and demonstrate its performance through an experiment about direct word discovery from auditory speech signals. The experiment shows that the combined method, the NPB-DAA with the DSAE, outperforms pre-existing unsupervised learning methods, and shows state-of-the-art performance. It is also shown that the proposed method outperforms several standard speech recognizer-based methods with true word dictionaries.
引用
收藏
页码:770 / 783
页数:14
相关论文
共 50 条
  • [1] Double Articulation Analyzer With Prosody for Unsupervised Word and Phone Discovery
    Okuda, Yasuaki
    Ozaki, Ryo
    Komura, Soichiro
    Taniguchi, Tadahiro
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (03) : 1335 - 1347
  • [2] Accelerated Nonparametric Bayesian Double Articulation Analyzer for Unsupervised Word Discovery
    Ozaki, Ryo
    Taniguchi, Tadahiro
    [J]. 2018 JOINT IEEE 8TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB), 2018, : 238 - 244
  • [3] Direct Word Discovery from Speech Signals based on Hierarchical Dirichlet Process-Hidden Language Model and Deep Sparse Autoencoder
    Taniguchi, Tadahiro
    Nakashima, Ryo
    Nagasaka, Shogo
    [J]. 2016 JOINT IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB), 2016, : 23 - 24
  • [4] Unsupervised Phoneme and Word Discovery From Multiple Speakers Using Double Articulation Analyzer and Neural Network With Parametric Bias
    Nakashima, Ryo
    Ozaki, Ryo
    Taniguchi, Tadahiro
    [J]. FRONTIERS IN ROBOTICS AND AI, 2019, 6
  • [5] Nonparametric Bayesian Double Articulation Analyzer for Direct Language Acquisition From Continuous Speech Signals
    Taniguchi, Tadahiro
    Nagasaka, Shogo
    Nakashima, Ryo
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2016, 8 (03) : 171 - 185
  • [6] Unsupervised word acquisition from speech using pattern discovery
    Park, Alex
    Glass, James R.
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 409 - 412
  • [7] Unsupervised Multimodal Word Discovery Based on Double Articulation Analysis With Co-Occurrence Cues
    Taniguchi, Akira
    Murakami, Hiroaki
    Ozaki, Ryo
    Taniguchi, Tadahiro
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (04) : 1825 - 1840
  • [8] Semiotic Prediction of Driving Behavior using Unsupervised Double Articulation Analyzer
    Taniguchi, Tadahiro
    Nagasaka, Shogo
    Hitomi, Kentarou
    Chandrasiri, Naiwala P.
    Bando, Takashi
    [J]. 2012 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2012, : 849 - 854
  • [9] A robust unsupervised pattern discovery and clustering of speech signals
    Kumar, Kishore R.
    Birla, Lokendra
    Rao, Sreenivasa K.
    [J]. PATTERN RECOGNITION LETTERS, 2018, 116 : 254 - 261
  • [10] Unsupervised phonetic and word level discovery for speech to speech translation for unwritten languages
    Hillis, Steven
    Kumar, Anushree Prasanna
    Black, Alan W.
    [J]. INTERSPEECH 2019, 2019, : 1138 - 1142