Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks

被引:34
|
作者
Li, Kun [1 ,2 ]
Mao, Shaoguang [3 ]
Li, Xu [1 ]
Wu, Zhiyong [3 ]
Meng, Helen [1 ,3 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
[2] SpeechX Ltd, Beijing, Peoples R China
[3] Tsinghua Univ, Grad Sch Shenzhen, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Beijing, Peoples R China
关键词
Lexical stress; Pitch accent; Non-native English; Language learning; Deep neural networks; PROMINENCE; FEATURES;
D O I
10.1016/j.specom.2017.11.003
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper investigates the use of multi-distribution deep neural networks (MD-DNNs) for automatic lexical stress detection and pitch accent detection, which are useful for suprasegmental mispronunciation detection and diagnosis in second-language (L2) English speech. The features used in this paper cover syllable-based prosodic features (including maximum syllable loudness, syllable nucleus duration and a pair of dynamic pitch values) as well as lexical and syntactic features (encoded as binary variables). As stressed/accented syllables are more prominent than their neighbors, the two preceding and two following syllables are also taken into consideration. Experimental results show that the MD-DNN for lexical stress detection achieves an accuracy of 87.9% in syllable classification (primary/secondary/no stress) for words with three or more syllables. This performance is much better than those of our previous work using Gaussian mixture models (GMMs) and the prominence model (PM), whose accuracies are 72.1% and 76.3% respectively. Approached similarly as the lexical stress detector, the pitch accent detector obtains an accuracy of 90.2%, which is better than the results of using the GMMs and PM by about 9.6% and 6.9% respectively.
引用
收藏
页码:28 / 36
页数:9
相关论文
共 24 条
  • [1] Mispronunciation Detection and Diagnosis in L2 English Speech Using Multi-Distribution Deep Neural Networks
    Li, Kun
    Meng, Helen
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 255 - 259
  • [2] Intonation classification for L2 English speech using multi-distribution deep neural networks
    Li, Kun
    Wu, Xixin
    Meng, Helen
    COMPUTER SPEECH AND LANGUAGE, 2017, 43 : 18 - 33
  • [3] Lexical Stress Detection for L2 English Speech Using Deep Belief Networks
    Li, Kun
    Qian, Xiaojun
    Kang, Shiyin
    Meng, Helen
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1810 - 1814
  • [4] Mispronunciation Detection and Diagnosis in L2 English Speech Using Multidistribution Deep Neural Networks
    Li, Kun
    Qian, Xiaojun
    Meng, Helen
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (01) : 193 - 207
  • [5] Lexical encoding of L2 tones: The role of L1 stress, pitch accent and intonation
    Braun, Bettina
    Galts, Tobias
    Kabak, Baris
    SECOND LANGUAGE RESEARCH, 2014, 30 (03) : 323 - 350
  • [6] Factorized Deep Neural Network Adaptation for Automatic Scoring of L2 Speech in English Speaking Tests
    Luo, Dean
    Zhang, Chunxiao
    Xia, Linzhong
    Wang, Lixin
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1656 - 1660
  • [7] Automatic Deep Neural Network-Based Segmental Pronunciation Error Detection of L2 English Speech (L1 Bengali)
    Bharati, Puja
    Chandra, Sabyasachi
    Das Mandal, Shayamal Kumar
    INTERSPEECH 2023, 2023, : 3068 - 3072
  • [8] Automatic Pitch Accent Detection Using Long Short-Term Memory Neural Networks
    Wu, Yizhi
    Li, Sha
    Li, Hongyan
    2019 INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING SYSTEMS (SPSS 2019), 2019, : 41 - 45
  • [9] Automatic Prosody Evaluation of L2 English Read Speech in Reference to Accent Dictionary with Transformer Encoder
    Suzuki, Yu
    Kato, Tsuneo
    Tamura, Akihiro
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 4466 - 4470
  • [10] Automatic Prosody Evaluation of L2 English Read Speech in Reference to Accent Dictionary with Transformer Encoder
    Suzuki, Yu
    Kato, Tsuneo
    Tamura, Akihiro
    INTERSPEECH 2022, 2022, : 4466 - 4470