SPEECH DISFLUENCIES MODELING IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

被引：0

作者：

Vasilisa, Verkhodanova O. ^{[1
]}

Alexey, Karpov A. ^{[1
]}

机构：

[1] Russian Acad Sci, St Petersburg Inst Informat & Automat, St Petersburg, Russia

来源：

TOMSK STATE UNIVERSITY JOURNAL | 2012年 / 363期

关键词：

speech disfluencies; automatic speech recognition; speech analysis;

D O I：

暂无

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

In this paper, the authors deal with the problem of speech disfluencies analysis for automatic speech recognition. The origin of speech disfluencies may be of different nature: it may be caused by external influence or by internal failure in the planning of speech act. Failures in the speech act planning may be various, among speech disfluencies one may name such as filled pauses, self-repairs and stipulations. Such disfluencies are an obstacle for automatic processing of speech and its transcriptions. Speech corpora with Rich Transcription (the transcription where such phenomena as sentence boundaries, fillers, disfluencies are marked) are used for studying speech disfluencies. Among such corpora are Czech Broadcast Conversation MDE Transcripts and SWITCHBOARD. It is still unclear what knowledge should be used in speech recognition systems to classify and elicit speech disfluencies. That is why there are no appropriate models of them, which could provide automatic disfluencies processing. Methods for such processing may be distinguished between those dealing with disfluencies by means of acoustic models and by means of combined models (acoustic and language ones). But owing to objective reasons (time and expert expenses) researchers frequently use only acoustic modeling in speech recognition systems. There are a lot of papers describing modeling of speech disfluencies as a part of ASR systems. There is also a group of approaches that are meant for speech recognition accuracy increase by separating disfluencies from speech signal in advance or by means of speech transcriptions. Among possible approaches to deal with these phenomena in ASR systems there are those that allow modeling and eliciting disfluencies as separate verbal and paralinguistic elements, and those that ignore them only distinguishing from useful speech, but not telling one from another. There is an alternative method of processing disfluencies as part of language modeling and modeling of unknown words: speech disfluencies may be treated as Unknown Words class, and then building a language model with the account of these phenomena. For the Russian language there are no methods developed for speech disfluencies processing, so it is worth trying to apply different methods and compare results. Due to high expenses of making a corpus of transcripts, which would account for speech disfluencies and would be suitable for training language model (at least 3-gram model), speech disfluencies processing with parametric methods seems to be optimal.

引用

页码：10 / +

页数：7

共 50 条

[1] Enriching speech recognition with automatic detection of sentence boundaries and disfluencies
Liu, Yang
Shriberg, Elizabeth
Stolcke, Andreas
Hillard, Dustin
Ostendorf, Mari
Harper, Mary
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1526 - 1540
[2] Modeling disfluencies in conversational speech
Siu, M
Ostendorf, M
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 386 - 389
[3] Automatic speech recognition systems
Catariov, A
[J]. Information Technologies 2004, 2004, 5822 : 83 - 93
[4] AUDITORY MODELING FOR AUTOMATIC SPEECH RECOGNITION
BEET, SW
MOORE, RK
TOMLINSON, MJ
[J]. PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 571 - 579
[5] Auditory modeling in automatic recognition of speech
Hermansky, H
[J]. SIGNAL ANALYSIS & PREDICTION I, 1997, : 17 - 22
[6] Lexical modeling for the development of Amharic automatic speech recognition systems
Tachbelie, Martha Yifiru
Abate, Solomon Teferra
[J]. LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (03) : 963 - 984
[7] Lexical modeling for the development of Amharic automatic speech recognition systems
Martha Yifiru Tachbelie
Solomon Teferra Abate
[J]. Language Resources and Evaluation, 2023, 57 : 963 - 984
[8] Subword Modeling for Automatic Speech Recognition
Livescu, Karen
Fosler-Lussier, Eric
Metze, Florian
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 44 - 57
[9] STATISTICAL MODELING FOR AUTOMATIC SPEECH RECOGNITION
MERCER, RL
[J]. AFIPS CONFERENCE PROCEEDINGS, 1983, 52 : 643 - &
[10] Statistical language modeling for speech disfluencies
Stolcke, A
Shriberg, E
[J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 405 - 408

← 1 2 3 4 5 →