SPEECH DISFLUENCIES MODELING IN AUTOMATIC SPEECH RECOGNITION SYSTEMS

被引:0
|
作者
Vasilisa, Verkhodanova O. [1 ]
Alexey, Karpov A. [1 ]
机构
[1] Russian Acad Sci, St Petersburg Inst Informat & Automat, St Petersburg, Russia
来源
关键词
speech disfluencies; automatic speech recognition; speech analysis;
D O I
暂无
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
In this paper, the authors deal with the problem of speech disfluencies analysis for automatic speech recognition. The origin of speech disfluencies may be of different nature: it may be caused by external influence or by internal failure in the planning of speech act. Failures in the speech act planning may be various, among speech disfluencies one may name such as filled pauses, self-repairs and stipulations. Such disfluencies are an obstacle for automatic processing of speech and its transcriptions. Speech corpora with Rich Transcription (the transcription where such phenomena as sentence boundaries, fillers, disfluencies are marked) are used for studying speech disfluencies. Among such corpora are Czech Broadcast Conversation MDE Transcripts and SWITCHBOARD. It is still unclear what knowledge should be used in speech recognition systems to classify and elicit speech disfluencies. That is why there are no appropriate models of them, which could provide automatic disfluencies processing. Methods for such processing may be distinguished between those dealing with disfluencies by means of acoustic models and by means of combined models (acoustic and language ones). But owing to objective reasons (time and expert expenses) researchers frequently use only acoustic modeling in speech recognition systems. There are a lot of papers describing modeling of speech disfluencies as a part of ASR systems. There is also a group of approaches that are meant for speech recognition accuracy increase by separating disfluencies from speech signal in advance or by means of speech transcriptions. Among possible approaches to deal with these phenomena in ASR systems there are those that allow modeling and eliciting disfluencies as separate verbal and paralinguistic elements, and those that ignore them only distinguishing from useful speech, but not telling one from another. There is an alternative method of processing disfluencies as part of language modeling and modeling of unknown words: speech disfluencies may be treated as Unknown Words class, and then building a language model with the account of these phenomena. For the Russian language there are no methods developed for speech disfluencies processing, so it is worth trying to apply different methods and compare results. Due to high expenses of making a corpus of transcripts, which would account for speech disfluencies and would be suitable for training language model (at least 3-gram model), speech disfluencies processing with parametric methods seems to be optimal.
引用
收藏
页码:10 / +
页数:7
相关论文
共 50 条
  • [1] Enriching speech recognition with automatic detection of sentence boundaries and disfluencies
    Liu, Yang
    Shriberg, Elizabeth
    Stolcke, Andreas
    Hillard, Dustin
    Ostendorf, Mari
    Harper, Mary
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1526 - 1540
  • [2] Modeling disfluencies in conversational speech
    Siu, M
    Ostendorf, M
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 386 - 389
  • [3] Automatic speech recognition systems
    Catariov, A
    [J]. Information Technologies 2004, 2004, 5822 : 83 - 93
  • [4] AUDITORY MODELING FOR AUTOMATIC SPEECH RECOGNITION
    BEET, SW
    MOORE, RK
    TOMLINSON, MJ
    [J]. PROCEEDINGS : INSTITUTE OF ACOUSTICS, VOL 8, PART 7: SPEECH & HEARING, 1986, 8 : 571 - 579
  • [5] Auditory modeling in automatic recognition of speech
    Hermansky, H
    [J]. SIGNAL ANALYSIS & PREDICTION I, 1997, : 17 - 22
  • [6] Lexical modeling for the development of Amharic automatic speech recognition systems
    Tachbelie, Martha Yifiru
    Abate, Solomon Teferra
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2023, 57 (03) : 963 - 984
  • [7] Lexical modeling for the development of Amharic automatic speech recognition systems
    Martha Yifiru Tachbelie
    Solomon Teferra Abate
    [J]. Language Resources and Evaluation, 2023, 57 : 963 - 984
  • [8] Subword Modeling for Automatic Speech Recognition
    Livescu, Karen
    Fosler-Lussier, Eric
    Metze, Florian
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) : 44 - 57
  • [9] STATISTICAL MODELING FOR AUTOMATIC SPEECH RECOGNITION
    MERCER, RL
    [J]. AFIPS CONFERENCE PROCEEDINGS, 1983, 52 : 643 - &
  • [10] Statistical language modeling for speech disfluencies
    Stolcke, A
    Shriberg, E
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 405 - 408