Using prosody for automatic sentence segmentation of multi-party meetings

被引:0
|
作者
Kolar, Jachym [1 ]
Shriberg, Elizabeth
Liu, Yang
机构
[1] Int Comp Sci Inst, Berkeley, CA 94704 USA
[2] Univ W Bohemia, Dept Cybernet, Plzen, Czech Republic
[3] SRI Int, Menlo Pk, CA 94025 USA
[4] Univ Texas Dallas, Dallas, TX 75230 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We explore the use of prosodic features beyond pauses, including duration, pitch, and energy features, for automatic sentence segmentation of ICSI meeting data. We examine two different approaches to boundary classification: score-level combination of independent language and prosodic models using HMMs, and feature-level combination of models using a boosting-based method (BoosTexter). We report classification results for reference word transcripts as well as for transcripts from a state-of-the-art automatic speech recognizer (ASR). We also compare results using the lexical model plus a pause-only prosody model, versus results using additional prosodic features. Results show that (1) information from pauses is important, including pause duration both at the boundary and at the previous and following word boundaries; (2) adding duration, pitch, and energy features yields significant improvement over pause alone; (3) the integrated boosting-based model performs better than the HMM for ASR conditions; (4) training the boosting-based model on recognized words yields further improvement.
引用
收藏
页码:629 / 636
页数:8
相关论文
共 50 条
  • [31] A Multi-party Conversational Social Robot Using LLMs
    Addlesee, Angus
    Cherakara, Neeraj
    Nelson, Nivan
    Garcia, Daniel Hernandez
    Gunson, Nancie
    Sieinska, Weronika
    Romeo, Marta
    Dondrup, Christian
    Lemon, Oliver
    COMPANION OF THE 2024 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, HRI 2024 COMPANION, 2024, : 1273 - 1275
  • [32] Analyzing Mouth-Opening Transition Pattern for Predicting Next Speaker in Multi-party Meetings
    Ishii, Ryo
    Kumano, Shiro
    Otsuka, Kazuhiro
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 209 - 216
  • [33] Multi-party energy management for smart building cluster with PV systems using automatic demand response
    Ma, Li
    Liu, Nian
    Wang, Lingfeng
    Zhang, Jianhua
    Lei, Jinyong
    Zeng, Zheng
    Wang, Cheng
    Cheng, Minyang
    ENERGY AND BUILDINGS, 2016, 121 : 11 - 21
  • [34] MODELING VOCAL INTERACTION FOR TEXT-INDEPENDENT DETECTION OF INVOLVEMENT HOTSPOTS IN MULTI-PARTY MEETINGS
    Laskowski, Kornel
    2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 81 - 84
  • [35] Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech
    Sklyar, Ilya
    Piunova, Anna
    Osendorfer, Christian
    INTERSPEECH 2022, 2022, : 4451 - 4455
  • [36] Multi-party concurrent signatures
    Tonien, Dongvu
    Susilo, Willy
    Safavi-Naini, Reihaneh
    INFORMATION SECURITY, PROCEEDINGS, 2006, 4176 : 131 - 145
  • [37] Multi-party bidirectional teleportation
    Seida, C.
    El Allati, A.
    Metwally, N.
    Hassouni, Y.
    OPTIK, 2021, 247
  • [38] Multi-Party Quantum Steganography
    Takashi Mihara
    International Journal of Theoretical Physics, 2017, 56 : 576 - 583
  • [39] ANALYSIS AND MODELING OF NEXT SPEAKING START TIMING BASED ON GAZE BEHAVIOR IN MULTI-PARTY MEETINGS
    Ishii, Ryo
    Otsuka, Kazuhiro
    Kumano, Shiro
    Yamato, Junji
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [40] Multi-party focus of attention recognition in meetings from head pose and multimodal contextual cues
    Ba, Sileye O.
    Odobez, Jean-Marc
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2221 - +