Prosodic knowledge sources for automatic speech recognition

被引:0
|
作者
Vergyri, D [1 ]
Stolcke, A [1 ]
Gadde, VRR [1 ]
Ferrer, L [1 ]
Shriberg, E [1 ]
机构
[1] SRI Int, Speech Technol & Res Lab, Menlo Pk, CA 94025 USA
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this work, different prosodic knowledge sources are integrated into a state-of-the-art large vocabulary speech recognition system. Prosody manifests itself on different levels in the speech signal: within the words as a change in phone durations and pitch, inbetween the words as a variation in the pause length, and beyond the words, correlating with higher linguistic structures and nonlexical phenomena. We investigate three models, each exploiting a different level of prosodic information, in rescoring N-best hypotheses according to how well recognized words correspond to prosodic features of the utterance. Experiments on the Switchboard corpus show word accuracy improvements with each prosodic knowledge source. A further improvement is observed with the combination of all models, demonstrating that they each capture somewhat different prosodic characteristics of the speech signal.
引用
收藏
页码:208 / 211
页数:4
相关论文
共 50 条
  • [1] Prosodic and accentual information for automatic speech recognition
    Milone, DH
    Rubio, AJ
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (04): : 321 - 333
  • [2] THE USE OF SPEECH KNOWLEDGE IN AUTOMATIC SPEECH RECOGNITION
    ZUE, VW
    [J]. PROCEEDINGS OF THE IEEE, 1985, 73 (11) : 1602 - 1615
  • [3] Speech production knowledge in automatic speech recognition
    King, Simon
    Frankel, Joe
    Livescu, Karen
    McDermott, Erik
    Richmond, Korin
    Wester, Mirjam
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2007, 121 (02): : 723 - 742
  • [4] Intelligibility Rating with Automatic Speech Recognition, Prosodic, and Cepstral Evaluation
    Haderlein, Tino
    Moers, Cornelia
    Moebius, Bernd
    Rosanowski, Frank
    Noeth, Elmar
    [J]. TEXT, SPEECH AND DIALOGUE, TSD 2011, 2011, 6836 : 195 - 202
  • [5] Predicting automatic speech recognition performance using prosodic cues
    Litman, DJ
    Hirschberg, JB
    Swerts, M
    [J]. 6TH APPLIED NATURAL LANGUAGE PROCESSING CONFERENCE/1ST MEETING OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE AND PROCEEDINGS OF THE ANLP-NAACL 2000 STUDENT RESEARCH WORKSHOP, 2000, : A218 - A225
  • [6] AN INTEGRATED KNOWLEDGE BASE FOR SPEECH SYNTHESIS AND AUTOMATIC SPEECH RECOGNITION
    TATHAM, MAA
    [J]. JOURNAL OF PHONETICS, 1985, 13 (02) : 175 - 188
  • [7] Speech activity detection and automatic prosodic processing unit segmentation for emotion recognition
    Sztaho, David
    Vicsi, Klara
    [J]. INTELLIGENT DECISION TECHNOLOGIES-NETHERLANDS, 2014, 8 (04): : 315 - 324
  • [8] AUTOMATIC DETECTION OF PROSODIC BOUNDARIES IN SPEECH
    CAMPBELL, N
    [J]. SPEECH COMMUNICATION, 1993, 13 (3-4) : 343 - 354
  • [9] Sound Source Separation and Automatic Speech Recognition for Moving Sources
    Nakadai, Kazuhiro
    Nakajima, Hirofumi
    Ince, Goekhan
    Hasegawa, Yuji
    [J]. IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 976 - 981
  • [10] Adaptive fusion of acoustic and visual sources for automatic speech recognition
    Rogozan, A
    Deléglise, P
    [J]. SPEECH COMMUNICATION, 1998, 26 (1-2) : 149 - 161