Using prosody for automatic sentence segmentation of multi-party meetings

被引:0
|
作者
Kolar, Jachym [1 ]
Shriberg, Elizabeth
Liu, Yang
机构
[1] Int Comp Sci Inst, Berkeley, CA 94704 USA
[2] Univ W Bohemia, Dept Cybernet, Plzen, Czech Republic
[3] SRI Int, Menlo Pk, CA 94025 USA
[4] Univ Texas Dallas, Dallas, TX 75230 USA
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We explore the use of prosodic features beyond pauses, including duration, pitch, and energy features, for automatic sentence segmentation of ICSI meeting data. We examine two different approaches to boundary classification: score-level combination of independent language and prosodic models using HMMs, and feature-level combination of models using a boosting-based method (BoosTexter). We report classification results for reference word transcripts as well as for transcripts from a state-of-the-art automatic speech recognizer (ASR). We also compare results using the lexical model plus a pause-only prosody model, versus results using additional prosodic features. Results show that (1) information from pauses is important, including pause duration both at the boundary and at the previous and following word boundaries; (2) adding duration, pitch, and energy features yields significant improvement over pause alone; (3) the integrated boosting-based model performs better than the HMM for ASR conditions; (4) training the boosting-based model on recognized words yields further improvement.
引用
收藏
页码:629 / 636
页数:8
相关论文
共 50 条
  • [1] On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings
    Kolar, Jachym
    Shriberg, Elizabeth
    Liu, Yang
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2014 - 2017
  • [2] Extractive summarization of multi-party meetings through discourse segmentation
    Bokaei, Mohammad Hadi
    Sameti, Hossein
    Liu, Yang
    NATURAL LANGUAGE ENGINEERING, 2016, 22 (01) : 41 - 72
  • [3] Linear Discourse Segmentation of Multi-Party Meetings Based on Local and Global Information
    Bokaei, Mohammad Hadi
    Sameti, Hossein
    Liu, Yang
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1879 - 1891
  • [4] Speaker diarization for multi-party meetings using acoustic fusion
    Anguera, X
    Wooters, C
    Hernando, J
    2005 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 2005, : 426 - 431
  • [5] Estimating Dominance in Multi-Party Meetings Using Speaker Diarization
    Hung, Hayley
    Huang, Yan
    Friedland, Gerald
    Gatica-Perez, Daniel
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 847 - 860
  • [6] A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings
    Yu, Fan
    Du, Zhihao
    Zhang, Shiliang
    Lin, Yuxiao
    Xie, Lei
    INTERSPEECH 2022, 2022, : 560 - 564
  • [7] Discourse segmentation of multi-party conversation
    Galley, M
    McKeown, K
    Fosler-Lussier, E
    Jing, HY
    41ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2003, : 562 - 569
  • [8] Extracting question/answer pairs in multi-party meetings
    Kathol, Andreas
    Tur, Gokhan
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 5053 - 5056
  • [9] A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings
    Shi, Mohan
    Zhang, Jie
    Du, Zhihao
    Yu, Fan
    Chen, Qian
    Zhang, Shiliang
    Dai, Li-Rong
    2023 ASIA PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE, APSIPA ASC, 2023, : 1943 - 1948
  • [10] Predicting Who Will Be the Next Speaker and When in Multi-party Meetings
    Ishii, Ryo
    Otsuka, Kazuhiro
    Kumano, Shiro
    Yamato, Junji
    NTT Technical Review, 2015, 13 (07):