Double articulation analyzer with deep sparse autoencoder for unsupervised word discovery from speech signals

被引：60

作者：

Taniguchi, Tadahiro ^{[1
]}

Nakashima, Ryo ^{[2
]}

Liu, Hailong ^{[2
]}

Nagasaka, Shogo ^{[2
]}

机构：

[1] Ritsumeikan Univ, Coll Informat Sci & Engn, Kusatsu, Japan

[2] Ritsumeikan Univ, Grad Sch Informat Sci & Engn, Kusatsu, Japan

来源：

ADVANCED ROBOTICS | 2016年 / 30卷 / 11-12期

关键词：

Bayesian nonparametrics; deep learning; speech recognition; unsupervised learning; word discovery; DRIVING BEHAVIOR; SEGMENTATION; ROBOTICS; MODEL;

D O I：

10.1080/01691864.2016.1159981

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Direct word discovery from audio speech signals is a very difficult and challenging problem for a developmental robot. Human infants are able to discover words directly from speech signals, and, to understand human infants' developmental capability using a constructive approach, it is very important to build a machine learning system that can acquire knowledge about words and phonemes, i.e. a language model and an acoustic model, autonomously in an unsupervised manner. To achieve this, the nonparametric Bayesian double articulation analyzer (NPB-DAA) with the deep sparse autoencoder (DSAE) is proposed in this paper. The NPB-DAA has been proposed to achieve totally unsupervised direct word discovery from speech signals. However, the performance was still unsatisfactory, although it outperformed pre-existing unsupervised learning methods. In this paper, we integrate the NPB-DAA with the DSAE, which is a neural network model that can be trained in an unsupervised manner, and demonstrate its performance through an experiment about direct word discovery from auditory speech signals. The experiment shows that the combined method, the NPB-DAA with the DSAE, outperforms pre-existing unsupervised learning methods, and shows state-of-the-art performance. It is also shown that the proposed method outperforms several standard speech recognizer-based methods with true word dictionaries.

引用

页码：770 / 783

页数：14

共 50 条

[41] Deep Unsupervised 3D Human Body Reconstruction from a Sparse set of Landmarks
Madadi, Meysam
Bertiche, Hugo
Escalera, Sergio
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (08) : 2499 - 2512
[42] Deep Unsupervised 3D Human Body Reconstruction from a Sparse set of Landmarks
Meysam Madadi
Hugo Bertiche
Sergio Escalera
[J]. International Journal of Computer Vision, 2021, 129 : 2499 - 2512
[43] Fully Unsupervised Word Learning from Continuous Speech Using Transitional Probabilities of Atomic Acoustic Events
Rasanen, Okko Johannes
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2922 - 2925
[44] Detection of Common Cold from Speech Signals using Deep Neural Network
Deb, Suman
Warule, Pankaj
Nair, Amrita
Sultan, Haider
Dash, Rahul
Krajewski, Jarek
[J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2022, 42 (3) : 1707 - 1722
[45] Detection of Common Cold from Speech Signals using Deep Neural Network
Suman Deb
Pankaj Warule
Amrita Nair
Haider Sultan
Rahul Dash
Jarek Krajewski
[J]. Circuits, Systems, and Signal Processing, 2023, 42 : 1707 - 1722
[46] Breath analysis based early gastric cancer classification from deep stacked sparse autoencoder neural network
Muhammad Aqeel Aslam
Cuili Xue
Yunsheng Chen
Amin Zhang
Manhua Liu
Kan Wang
Daxiang Cui
[J]. Scientific Reports, 11
[47] Predicting protein-protein interactions from protein sequences by a stacked sparse autoencoder deep neural network
Wang, Yan-Bin
You, Zhu-Hong
Li, Xiao
Jiang, Tong-Hai
Chen, Xing
Zhou, Xi
Wang, Lei
[J]. MOLECULAR BIOSYSTEMS, 2017, 13 (07) : 1336 - 1344
[48] Breath analysis based early gastric cancer classification from deep stacked sparse autoencoder neural network
Aslam, Muhammad Aqeel
Xue, Cuili
Chen, Yunsheng
Zhang, Amin
Liu, Manhua
Wang, Kan
Cui, Daxiang
[J]. SCIENTIFIC REPORTS, 2021, 11 (01)
[49] Improved automatic speech recognition system using sparse decomposition by basis pursuit with deep rectifier neural networks and compressed sensing recomposition of speech signals
Gavrilescu, Mihai
[J]. 2014 10TH INTERNATIONAL CONFERENCE ON COMMUNICATIONS (COMM), 2014,
[50] A deep variational convolutional Autoencoder for unsupervised features extraction of ceramic profiles. A case study from central Italy
Cardarelli, Lorenzo
[J]. JOURNAL OF ARCHAEOLOGICAL SCIENCE, 2022, 144

← 1 2 3 4 5 →