Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition

被引：0

作者：

Wang, Jun ^{[1
,2
]}

Samal, Ashok ^{[3
]}

Green, Jordan R. ^{[4
]}

机构：

[1] Univ Texas Dallas, Dept Bioengn, Dallas, TX USA

[2] Univ Texas Dallas, Callier Ctr Commun Disorders, Dallas, TX USA

[3] Univ Nebraska, Dept Comp Sci & Engn, Lincoln, NE 68588 USA

[4] MGH Inst Hlth Profess, Dept Commun Sci & Disorders, Boston, MA USA

来源：

15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4 | 2014年

基金：

美国国家卫生研究院;

关键词：

silent speech recognition; speech kinematics; Procrustes analysis; support vector machine; VOICE; ELECTROLARYNX; KNOWLEDGE; SYSTEM;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Silent speech interfaces (SSIs), which recognize speech from articulatory information (i.e., without using audio information), have the potential to enable persons with laryngectomy or a neurological disease to produce synthesized speech with a natural sounding voice using their tongue and lips. Current approaches to SSIs have largely relied on speaker-dependent recognition models to minimize the negative effects of talker variation on recognition accuracy. Speaker-independent approaches are needed to reduce the large amount of training data required from each user; only limited articulatory samples are often available for persons with moderate to severe speech impairments, due to the logistic difficulty of data collection. This paper reported an across-speaker articulatory normalization approach based on Procrustes matching, a bidimensional regression technique for removing translational, scaling, and rotational effects of spatial data. A dataset of short functional sentences was collected from seven English talkers. A support vector machine was then trained to classify sentences based on normalized tongue and lip movements. Speaker-independent classification accuracy (tested using leave-one-subject-out cross validation) improved significantly, from 68.63% to 95.90%, following normalization. These results support the feasibility of a speaker-independent SSI using Procrustes matching as the basis for articulatory normalization across speakers.

引用

页码：1179 / 1183

页数：5

共 50 条

[21] An automatic speech recognition system with speaker-independent identification support
Caranica, Alexandru
Burileanu, Corneliu
ADVANCED TOPICS IN OPTOELECTRONICS, MICROELECTRONICS, AND NANOTECHNOLOGIES VII, 2015, 9258
[22] ON LARGE-VOCABULARY SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
LEE, KF
SPEECH COMMUNICATION, 1988, 7 (04) : 375 - 379
[23] Speaker-independent telephone speech recognition system: the VCS TeleRec
Hunt, Alan
Speech technology, 1988, 4 (02): : 80 - 82
[24] A SPEAKER-INDEPENDENT SPEECH RECOGNITION SYSTEM FOR TELEPHONE NETWORK APPLICATIONS
TRNKA, R
REVUE TECHNIQUE THOMSON-CSF, 1984, 16 (04): : 847 - 861
[25] SPEAKER-INDEPENDENT WORD RECOGNITION IN CONNECTED SPEECH ON THE BASIS OF PHONEME RECOGNITION
MAENOBU, K
ARIKI, Y
SAKAI, T
INFORMATION SCIENCES, 1984, 33 (1-2) : 31 - 61
[26] SPEAKER-INDEPENDENT CONTINUOUS SPEECH DICTATION
GAUVAIN, JL
LAMEL, LF
ADDA, G
ADDADECKER, M
SPEECH COMMUNICATION, 1994, 15 (1-2) : 21 - 37
[27] The study on continuous speech of speaker-independent
Ye Hong
CHINESE JOURNAL OF ELECTRONICS, 2006, 15 (4A): : 921 - 924
[28] LIKELIHOOD NORMALIZATION FOR SPEAKER VERIFICATION USING A PHONEME-INDEPENDENT AND SPEAKER-INDEPENDENT MODEL
MATSUI, T
FURUI, S
SPEECH COMMUNICATION, 1995, 17 (1-2) : 109 - 116
[29] An integrated study of speaker normalisation and HMM adaptation for noise robust speaker-independent speech recognition
Hariharan, R
Viikki, O
SPEECH COMMUNICATION, 2002, 37 (3-4) : 349 - 361
[30] Speaker-independent recognition of Chinese tones
GUAN Cuntai and CHEN Yongbin(Dep. of Radio Eng.
Chinese Journal of Acoustics, 1993, (02) : 142 - 148

← 1 2 3 4 5 →