Automatic lexical stress and pitch accent detection for L2 English speech using multi-distribution deep neural networks
被引:34
|
作者:
Li, Kun
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
SpeechX Ltd, Beijing, Peoples R ChinaChinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
Li, Kun
[1
,2
]
Mao, Shaoguang
论文数: 0引用数: 0
h-index: 0
机构:
Tsinghua Univ, Grad Sch Shenzhen, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Beijing, Peoples R ChinaChinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
Mao, Shaoguang
[3
]
Li, Xu
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R ChinaChinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
Li, Xu
[1
]
Wu, Zhiyong
论文数: 0引用数: 0
h-index: 0
机构:
Tsinghua Univ, Grad Sch Shenzhen, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Beijing, Peoples R ChinaChinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
Wu, Zhiyong
[3
]
Meng, Helen
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
Tsinghua Univ, Grad Sch Shenzhen, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Beijing, Peoples R ChinaChinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
Meng, Helen
[1
,3
]
机构:
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Hong Kong, Hong Kong, Peoples R China
[2] SpeechX Ltd, Beijing, Peoples R China
[3] Tsinghua Univ, Grad Sch Shenzhen, Tsinghua CUHK Joint Res Ctr Media Sci Technol & S, Beijing, Peoples R China
Lexical stress;
Pitch accent;
Non-native English;
Language learning;
Deep neural networks;
PROMINENCE;
FEATURES;
D O I:
10.1016/j.specom.2017.11.003
中图分类号:
O42 [声学];
学科分类号:
070206 ;
082403 ;
摘要:
This paper investigates the use of multi-distribution deep neural networks (MD-DNNs) for automatic lexical stress detection and pitch accent detection, which are useful for suprasegmental mispronunciation detection and diagnosis in second-language (L2) English speech. The features used in this paper cover syllable-based prosodic features (including maximum syllable loudness, syllable nucleus duration and a pair of dynamic pitch values) as well as lexical and syntactic features (encoded as binary variables). As stressed/accented syllables are more prominent than their neighbors, the two preceding and two following syllables are also taken into consideration. Experimental results show that the MD-DNN for lexical stress detection achieves an accuracy of 87.9% in syllable classification (primary/secondary/no stress) for words with three or more syllables. This performance is much better than those of our previous work using Gaussian mixture models (GMMs) and the prominence model (PM), whose accuracies are 72.1% and 76.3% respectively. Approached similarly as the lexical stress detector, the pitch accent detector obtains an accuracy of 90.2%, which is better than the results of using the GMMs and PM by about 9.6% and 6.9% respectively.
机构:
Univ Calif San Francisco, Sch Med, Dept Radiol & Biomed Imaging, San Francisco, CA 94143 USAUniv Calif San Francisco, Sch Med, Dept Radiol & Biomed Imaging, San Francisco, CA 94143 USA