Spoken Language Understanding of Human-Machine Conversations for Language Learning Applications

被引：5

作者：

Qian, Yao ^{[1
]}

Ubale, Rutuja ^{[1
]}

Lange, Patrick ^{[1
]}

Evanini, Keelan ^{[2
]}

Ramanarayanan, Vikram ^{[1
,3
]}

Soong, Frank K. ^{[4
]}

机构：

[1] Educ Testing Serv Res, San Francisco, CA 94134 USA

[2] Educ Testing Serv Res, Princeton, NJ USA

[3] Univ Calif San Francisco, San Francisco, CA 94143 USA

[4] Microsoft Res Asia, Beijing, Peoples R China

来源：

JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY | 2020年 / 92卷 / 08期

关键词：

Spoken language understanding; Human-machine conversational systems; Computer assisted language learning; End-to-end modeling; Education;

D O I：

10.1007/s11265-019-01484-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Spoken language understanding (SLU) in human machine conversational systems is the process of interpreting the semantic meaning conveyed by a user's spoken utterance. Traditional SLU approaches transform the word string transcribed by an automatic speech recognition (ASR) system into a semantic label that determines the machine's subsequent response. However, the robustness of SLU results can suffer in the context of a human-machine conversation-based language learning system due to the presence of ambient noise, heavily accented pronunciation, ungrammatical utterances, etc. To address these issues, this paper proposes an end-to-end (E2E) modeling approach for SLU and evaluates the semantic labeling performance of a bidirectional LSTM-RNN with input at three different levels: acoustic (filterbank features), phonetic (subphone posteriorgrams), and lexical (ASR hypotheses). Experimental results for spoken responses collected in a dialog application designed for English learners to practice job interviewing skills show that multi-level BLSTM-RNNs can utilize complementary information from the three different levels to improve the semantic labeling performance. An analysis of results on OOV utterances, which can be common in a conversation-based dialog system, also indicates that using subphone posteriorgrams outperforms ASR hypotheses and incorporating the lower-level features for semantic labeling can be advantageous to improving the final SLU performance.

引用

页码：805 / 817

页数：13

共 50 条

[1] Spoken Language Understanding of Human-Machine Conversations for Language Learning Applications
Yao Qian
Rutuja Ubale
Patrick Lange
Keelan Evanini
Vikram Ramanarayanan
Frank K. Soong
Journal of Signal Processing Systems, 2020, 92 : 805 - 817
[2] Robots that learn language: Developmental approach to human-machine conversations
Iwahashi, Naoto
SYMBOL GROUNDING AND BEYOND, PROCEEDINGS, 2006, 4211 : 143 - 167
[3] Automatic recognition and understanding of spoken language - A first step toward natural human-machine communication
Juang, BH
Furui, S
PROCEEDINGS OF THE IEEE, 2000, 88 (08) : 1142 - 1165
[4] Spoken language understanding software for language learning
Alam, Hassan
Kumar, Aman
Rahman, Fuad
Hartono, Rachmat
Tarnikova, Yuliya
INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL II, 2007, : 107 - +
[5] Applications of Statistical Machine Translation Approaches to Spoken Language Understanding
Macherey, Klaus
Bender, Oliver
Ney, Hermann
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2009, 17 (04): : 803 - 818
[6] Spoken language understanding and interaction: machine learning for human-like conversational systems
Gasic, Milica
Hakkani-Tur, Dilek
Celikyilmaz, Asli
COMPUTER SPEECH AND LANGUAGE, 2017, 46 : 249 - 251
[7] A Human-Machine Language Dictionary
Fei Liu
Shirin Akther Khanam
Yi-Ping Phoebe Chen
International Journal of Computational Intelligence Systems, 2020, 13 : 904 - 913
[8] A Human-Machine Language Dictionary
Liu, Fei
Khanam, Shirin Akther
Chen, Yi-Ping Phoebe
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 904 - 913
[9] Active learning for spoken language understanding
Tur, G
Schapire, RE
Hakkani-Tür, D
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 276 - 279
[10] Grammar learning for spoken language understanding
Wang, YY
Acero, A
ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, 2001, : 292 - 295

← 1 2 3 4 5 →