Speaker-Conditioned Hierarchical Modeling for Automated Speech Scoring

被引:2
|
作者
Singla, Yaman Kumar [1 ,2 ,3 ]
Gupta, Avyakt [1 ]
Bagga, Shaurya [1 ]
Chen, Changyou [3 ]
Krishnamurthy, Balaji [2 ]
Shah, Rajiv Ratn [1 ]
机构
[1] IIIT Delhi, New Delhi, India
[2] Adobe, New Delhi, India
[3] SUNY Buffalo, Buffalo, NY USA
关键词
automated speech scoring; spontaneous speech; end-to-end neural architectures; hierarchical modeling; multi-modal; interpretability; AI in education;
D O I
10.1145/3459637.3482395
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic Speech Scoring (ASS) is the computer-assisted evaluation of a candidate's speaking proficiency in a language. ASS systems face many challenges like open grammar, variable pronunciations, and unstructured or semi-structured content. Recent deep learning approaches have shown some promise in this domain. However, most of these approaches focus on extracting features from a single audio, making them suffer from the lack of speaker-specific context required to model such a complex task. We propose a novel deep learning technique for non-native ASS, called speaker-conditioned hierarchical modeling. In our technique, we take advantage of the fact that oral proficiency tests rate multiple responses for a candidate. We extract context vectors from these responses and feed them as additional speaker-specific context to our network to score a particular response. We compare our technique with strong baselines and find that such modeling improves the model's average performance by 6.92% (maximum = 12.86%, minimum = 4.51%). We further show both quantitative and qualitative insights into the importance of this additional context in solving the problem of ASS.
引用
收藏
页码:1681 / 1691
页数:11
相关论文
共 50 条
  • [1] VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
    Wang, Quan
    Muckenhirn, Hannah
    Wilson, Kevin
    Sridhar, Prashant
    Wu, Zelin
    Hershey, John R.
    Saurous, Rif A.
    Weiss, Ron J.
    Jia, Ye
    Moreno, Ignacio Lopez
    [J]. INTERSPEECH 2019, 2019, : 2728 - 2732
  • [2] Speaker conditioned acoustic modeling for multi-speaker conversational ASR
    Chetupalli, Srikanth Raj
    Ganapathy, Sriram
    [J]. INTERSPEECH 2022, 2022, : 3834 - 3838
  • [3] A comparison of two scoring methods for an automated speech scoring system
    Xi, Xiaoming
    Higgins, Derrick
    Zechner, Klaus
    Williamson, David
    [J]. LANGUAGE TESTING, 2012, 29 (03) : 371 - 394
  • [4] Speaker modeling with various speech representations
    Chen, K
    [J]. BIOMETRIC AUTHENTICATION, PROCEEDINGS, 2004, 3072 : 592 - 599
  • [5] A hierarchical classification approach to automated essay scoring
    McNamara, Danielle S.
    Crossley, Scott A.
    Roscoe, Rod D.
    Allen, Laura K.
    Dai, Jianmin
    [J]. ASSESSING WRITING, 2015, 23 : 35 - 59
  • [6] Duration and Pronunciation Conditioned Lexical Modeling for Speaker Verification
    Tur, Gokhan
    Shriberg, Elizabeth
    Stolcke, Andreas
    Kajarekar, Sachin
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2664 - 2667
  • [7] AUTOMATED SPEECH RECOGNITION SYSTEM FOR SPEAKER EMOTION CLASSIFICATION
    Anithadevi, N.
    Gokul, P.
    Nandan, S. Muhil
    Magesh, R.
    Shiddharth, S.
    [J]. PROCEEDINGS OF THE 2020 5TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND SECURITY (ICCCS-2020), 2020,
  • [8] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
    M. Milošević
    Ž. Nedeljković
    U. Glavitsch
    Ž. Đurović
    [J]. Journal of Communications Technology and Electronics, 2019, 64 : 1256 - 1265
  • [9] Speaker adaptation by modeling the speaker variation in a continuous speech recognition system
    Strom, N
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 989 - 992
  • [10] Speaker Modeling Using Emotional Speech for More Robust Speaker Identification
    Milosevic, M.
    Nedeljkovic, Z.
    Glavitsch, U.
    Durovic, Z.
    [J]. JOURNAL OF COMMUNICATIONS TECHNOLOGY AND ELECTRONICS, 2019, 64 (11) : 1256 - 1265