Improving Mandarin Prosody Boundary Detection by Using Phonetic Information and Deep LSTM Model

被引:0
|
作者
Lin, Ju [1 ,2 ]
Ji, Zhuanzhuan [3 ]
Dong, Wenwei [3 ]
Xie, Yanlu [3 ]
Zhang, Jinsong [3 ]
机构
[1] Beijing Language & Culture Univ, Beijing, Peoples R China
[2] Clemson Univ, Atlanta, GA USA
[3] Beijing Language & Culture Univ, Beijing Adv Innovat Ctr Language Resources, Beijing, Peoples R China
关键词
Prosodic boundary detection; articulatory information; sequence labeling; LSTM;
D O I
10.1109/ialp48816.2019.9037697
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic prosodic boundary detection is useful for automatic speech processing, such as automatic speech recognition (ASR) and speech synthesis. In this paper, we propose two techniques to improve the boundary detection performance. First, in addition to prosody features (e.g, pitch, duration and energy), phonetic information (word/articulatory information) is integrated into the framework of prosodic boundary detection. We compared two forms of phonetic information: word form and articulatory form. Moreover, boundary detection can be regarded as a sequence labeling task. A deep Long Short-Term Memory (LSTM) is adopted for this task, which replaces the traditional Deep Neural Networks (DNN) model. The experimental results showed that the boundary detection performance can be improved by the additional phonetic information, with relative 5.9% (word form) and 9.8% (articulatory form) improvements respectively in contrast with the system that only used prosody features modeled. The articulatory information and prosody features with deep LSTM achieved the best result, with further performance enhancement from 76.35% to 77.85% (relative 6.3%) compared with that modeled by DNN.
引用
收藏
页码:504 / 508
页数:5
相关论文
共 50 条
  • [1] AUTOMATIC PROSODY BOUNDARY LABELING OF MANDARIN USING BOTH TEXT AND ACOUSTIC INFORMATION
    Ni, Chongjia
    Liu, Wenju
    Xu, Bo
    [J]. 2008 6TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2008, : 354 - 357
  • [2] Improving Mandarin Prosody Generation Using Alternative Smoothing Techniques
    Huang, Yi-Chin
    Wu, Chung-Hsien
    Weng, Si-Ting
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (11) : 1897 - 1907
  • [3] Mandarin Prosody Boundary Prediction based on Sequence-to-sequence Model
    Yan, Yajing
    Jiang, Jiaolong
    Yang, Hongwu
    [J]. PROCEEDINGS OF 2020 IEEE 4TH INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2020), 2020, : 1013 - 1017
  • [4] Automatic Mandarin Prosody Boundary Detecting Based on Tone Nucleus Features and DNN Model
    Lin, Ju
    Xie, Yanlu
    Zhang, Wei
    Zhang, Jinsong
    [J]. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2016,
  • [5] Improving Response Time of Active Speaker Detection using Visual Prosody Information Prior to Articulation
    Haider, Fasih
    Luz, Saturnino
    Vogel, Carl
    Campbell, Nick
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1736 - 1740
  • [6] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
    Chiang, Chen-Yu
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2018,
  • [7] Prosody Conversion for Emotional Mandarin Speech Synthesis Using the Tone Nucleus Model
    Wen, Miaomiao
    Wang, Miaomiao
    Hirose, Keikichi
    Minematsu, Nobuaki
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2808 - +
  • [8] A parametric prosody coding approach for Mandarin speech using a hierarchical prosodic model
    Chen-Yu Chiang
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2018
  • [9] Improving Breast Cancer Detection Using Symmetry Information with Deep Learning
    Hagos, Yeman Brhane
    Merida, Albert Gubern
    Teuwen, Jonas
    [J]. IMAGE ANALYSIS FOR MOVING ORGAN, BREAST, AND THORACIC IMAGES, 2018, 11040 : 90 - 97
  • [10] Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models
    Li, Wei
    Chen, Nancy F.
    Siniscalchi, Sabato Marco
    Lee, Chin-Hui
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2759 - 2763