Enriching speech recognition with automatic detection of sentence boundaries and disfluencies

被引:156
|
作者
Liu, Yang [1 ]
Shriberg, Elizabeth
Stolcke, Andreas
Hillard, Dustin
Ostendorf, Mari
Harper, Mary
机构
[1] Univ Texas, Dept Comp Sci, Richardson, TX 75080 USA
[2] SRI Int, Menlo Pk, CA 94025 USA
[3] Int Comp Sci Inst, Berkeley, CA 94704 USA
[4] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
[5] Univ Maryland, College Pk, MD 20742 USA
基金
美国国家科学基金会;
关键词
conditional random field; confusion network; disfluency; maximum entropy; metadata extraction; prosody; punctuation; rich transcription; sentence boundary;
D O I
10.1109/TASL.2006.878255
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Effective human and automatic processing of speech requires recovery of more than just the words. It also involves recovering phenomena such as sentence boundaries, filler words, and disfluencies, referred to as structural metadata. We, describe a metadata detection system that combines information from different types of textual knowledge sources with information from a prosodic classifier. We investigate maximum entropy and conditional random field models, its well as the predominant hidden Markov model (HMM) approach, and find that discriminative models generally outperform generative models. We report system performance on both broadcast news and conversational telephone speech tasks, illustrating significant performance differences across tasks and as a function of recognizer performance. The results represent the state of the art, as assessed in the NIST RT-04F evaluation.
引用
收藏
页码:1526 / 1540
页数:15
相关论文
共 50 条
  • [1] SPEECH DISFLUENCIES MODELING IN AUTOMATIC SPEECH RECOGNITION SYSTEMS
    Vasilisa, Verkhodanova O.
    Alexey, Karpov A.
    [J]. TOMSK STATE UNIVERSITY JOURNAL, 2012, (363): : 10 - +
  • [2] On the analysis of speech and disfluencies for automatic detection of Mild Cognitive Impairment
    K. López-de-Ipiña
    U. Martinez-de-Lizarduy
    P. M. Calvo
    B. Beitia
    J. García-Melero
    E. Fernández
    M. Ecay-Torres
    M. Faundez-Zanuy
    P. Sanz
    [J]. Neural Computing and Applications, 2020, 32 : 15761 - 15769
  • [3] On the analysis of speech and disfluencies for automatic detection of Mild Cognitive Impairment
    Lopez-de-Ipina, K.
    Martinez-de-Lizarduy, U.
    Calvo, P. M.
    Beitia, B.
    Garcia-Melero, J.
    Fernandez, E.
    Ecay-Torres, M.
    Faundez-Zanuy, M.
    Sanz, P.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2020, 32 (20): : 15761 - 15769
  • [4] AUTOMATIC DETECTION OF PROSODIC BOUNDARIES IN SPEECH
    CAMPBELL, N
    [J]. SPEECH COMMUNICATION, 1993, 13 (3-4) : 343 - 354
  • [5] Automatic Speech Recognition for Thai Sentence based on MFCC and CNNs
    Sukvichai, Kanjanapan
    Utintu, Chaitat
    Muknumporn, Warayut
    [J]. 2021 SECOND INTERNATIONAL SYMPOSIUM ON INSTRUMENTATION, CONTROL, ARTIFICIAL INTELLIGENCE, AND ROBOTICS (ICA-SYMP), 2021, : 108 - 111
  • [6] Automatic detection of syllable boundaries in spontaneous speech
    Bigi, Brigitte
    Meunier, Christine
    Nesterenko, Irina
    Bertrand, Roxane
    [J]. LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010,
  • [7] Automatic detection of prosodic boundaries in spontaneous speech
    Biron, Tirza
    Baum, Daniel
    Freche, Dominik
    Matalon, Nadav
    Ehrmann, Netanel
    Weinreb, Eyal
    Biron, David
    Moses, Elisha
    [J]. PLOS ONE, 2021, 16 (05):
  • [8] Coping with disfluencies in spontaneous speech recognition: Acoustic detection and linguistic context manipulation
    Stouten, Frederik
    Duchateau, Jacques
    Martens, Jean-Pierre
    Wambacq, Patrick
    [J]. SPEECH COMMUNICATION, 2006, 48 (11) : 1590 - 1606
  • [9] Matrix sentence intelligibility prediction using an automatic speech recognition system
    Schaedler, Marc Rene
    Warzybok, Anna
    Hochmuth, Sabine
    Kollmeier, Birger
    [J]. INTERNATIONAL JOURNAL OF AUDIOLOGY, 2015, 54 : 100 - 107
  • [10] In Search of Sentence Boundaries in Spontaneous Speech
    Bogdanova-Beglarian, Natalia
    [J]. SPEECH AND COMPUTER, SPECOM 2017, 2017, 10458 : 456 - 463