Joint prediction of punctuation and disfluency in speech transcripts

被引:1
|
作者
Lin, Binghuai [1 ]
Wang, Liyuan [1 ]
机构
[1] Tencent Technol Co Ltd, Smart Platform Prod Dept, Beijing, Peoples R China
来源
关键词
Punctuation prediction; disfluency prediction; joint prediction; MTL; attention; MODELS;
D O I
10.21437/Interspeech.2020-1277
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Spoken language transcripts generated from Automatic speech recognition (ASR) often contain a large portion of disfluency and lack punctuation symbols. Punctuation restoration and disfluency removal of the transcripts can facilitate downstream tasks such as machine translation, information extraction and syntactic analysis [1]. Various studies have shown the influence between these two tasks and thus performed modeling based on a multi-task learning (MTL) framework [2, 3], which learns general representations in the shared layers and separate representations in the task-specific layers. However, task dependencies are normally ignored in the task-specific layers. To model the dependencies of tasks, we propose an attention-based structure in the task-specific layers of the MTL framework incorporating the pretrained BERT (a state-of-art NLP-related model) [4]. Experimental results based on English IWSLT dataset and the Switchboard dataset show the proposed architecture outperforms the separate modeling methods as well as the traditional MTL methods.
引用
收藏
页码:716 / 720
页数:5
相关论文
共 50 条
  • [1] LSTM for Punctuation Restoration in Speech Transcripts
    Tilk, Ottokar
    Alumae, Tanel
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 683 - 687
  • [2] Visualizing Punctuation Restoration in Speech Transcripts with Prosograph
    Oktem, Alp
    Farrus, Mireia
    Bonafonte, Antonio
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1493 - 1494
  • [3] FOUR-IN-ONE: A JOINT APPROACH TO INVERSE TEXT NORMALIZATION, PUNCTUATION, CAPITALIZATION, AND DISFLUENCY FOR AUTOMATIC SPEECH RECOGNITION
    Tan, Sharman
    Behre, Piyush
    Kibre, Nick
    Alphonso, Issac
    Chang, Shuangyu
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 677 - 684
  • [4] Is a coherent punctuation system possible for spontaneous speech transcripts?
    Deulofeu, Henri-Jose
    [J]. LANGUE FRANCAISE, 2011, (172): : 115 - +
  • [5] JOINT PREDICTION OF TRUECASING AND PUNCTUATION FOR CONVERSATIONAL SPEECH IN LOW-RESOURCE SCENARIOS
    Pappagari, Raghavendra
    Zelasko, Piotr
    Mikolajczyk, Agnieszka
    Pezik, Piotr
    Dehak, Najim
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 1185 - 1191
  • [6] STREAMING JOINT SPEECH RECOGNITION AND DISFLUENCY DETECTION
    Futami, Hayato
    Tsunoo, Emiru
    Shibata, Kentaro
    Kashiwagi, Yosuke
    Okuda, Takao
    Arora, Siddhant
    Watanabe, Shinji
    [J]. arXiv, 2022,
  • [7] Punctuation Prediction Model for Conversational Speech
    Zelasko, Piotr
    Szymanski, Piotr
    Mizgajski, Jan
    Szymczak, Adrian
    Carmiel, Yishay
    Dehak, Najim
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2633 - 2637
  • [8] Leveraging Prosody for Punctuation Prediction of Spontaneous Speech
    Cho, Jenny Yeonjin
    Ng, Sara
    Trang Tran
    Ostendorf, Mari
    [J]. INTERSPEECH 2022, 2022, : 555 - 559
  • [9] Investigating for Punctuation Prediction in Chinese Speech Transcriptions
    Liu, Xin
    Liu, Yi
    Song, Xiao
    [J]. 2018 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2018, : 74 - 78
  • [10] Bilingual Experiments on Automatic Recovery of Capitalization and Punctuation of Automatic Speech Transcripts
    Batista, Fernando
    Moniz, Helena
    Trancoso, Isabel
    Mamede, Nuno
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (02): : 474 - 485