DNN Online with iVectors Acoustic Modeling and Doc2Vec Distributed Representations for Improving Automated Speech Scoring

被引:1
|
作者
Tao, Jidong [1 ]
Chen, Lei [1 ]
Lee, Chong Min [1 ]
机构
[1] Educ Testing Serv, 660 Rosedale Rd, Princeton, NJ 08541 USA
关键词
automated speech scoring; non-native spontaneous speech; automatic speech recognition; unsupervised language model adaptation; content vector analysis; Doc2Vec;
D O I
10.21437/Interspeech.2016-1457
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
When applying automated speech-scoring technology to the rating of globally administered real assessments, there are several practical challenges: (a) ASR accuracy on non-native spontaneous speech is generally low; (b) due to the data mismatch between an ASR systems training stage and its final usage, the recognition accuracy obtained in practice is even lower; (c) content-relevance was not widely used in the scoring models in operation due to various technical and logistical issues. For this paper, an ASR in a deep neural network (DNN) architecture of multi-splice with iVectors was trained and resulted in a performance at 19.1% word error rate (WER). Secondly, we applied language model (LM) adaptation for the prompts that were not covered in ASR training by using the spoken responses acquired from previous operational tests, and we were able to reduce the relative WER by more than 8%. The boosted ASR performance improves the scoring performance without any extra human annotation cost. Finally, the developed ASR system allowed us to apply content features in practice. Besides the conventional frequency-based approach, content vector analysis (CVA), we also explored distributed representations with Doc2Vec and found an improvement on content measurement.
引用
收藏
页码:3117 / 3121
页数:5
相关论文
共 1 条
  • [1] Automated Scoring of Interview Videos using Doc2Vec Multimodal Feature Extraction Paradigm
    Chen, Lei
    Feng, Gary
    Leong, Chee Wee
    Lehman, Blair
    Martin-Raugh, Michelle
    Kell, Harrison
    Lee, Chong Min
    Yoon, Su-Youn
    ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 161 - 168