Experimenting with lipreading for large vocabulary continuous speech recognition

被引：4

作者：

Palecek, Karel ^{[1
]}

机构：

[1] Tech Univ Liberec, Inst Informat Technol & Elect, Liberec 46117, Czech Republic

来源：

JOURNAL ON MULTIMODAL USER INTERFACES | 2018年 / 12卷 / 04期

关键词：

Audiovisual speech recognition; Lipreading; LVCSR; AUDIOVISUAL SPEECH;

D O I：

10.1007/s12193-018-0266-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Vast majority of current research in the area of audiovisual speech recognition via lipreading from frontal face videos focuses on simple cases such as isolated phrase recognition or structured speech, where the vocabulary is limited to several tens of units. In this paper, we diverge from these traditional applications and investigate the effect of incorporating the visual and also depth information in the task of continuous speech recognition with vocabulary size ranging from several hundred to half a million words. To this end, we evaluate various visual speech parametrizations, both existing and novel, that are designed to capture different kind of information in the video and depth signals. The experiments are conducted on a moderate sized dataset of 54 speakers, each uttering 100 sentences in Czech language. Both the video and depth data was captured by the Microsoft Kinect device. We show that even for large vocabularies the visual signal contains enough information to improve the word accuracy up to 22% relatively to the acoustic-only recognition. Somewhat surprisingly, a relative improvement of up to 16% has also been reached using the interpolated depth data.

引用

页码：309 / 318

页数：10

共 50 条

[21] A review of large-vocabulary continuous-speech recognition
Young, S
IEEE SIGNAL PROCESSING MAGAZINE, 1996, 13 (05) : 45 - 57
[22] A LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION SYSTEM WITH HIGH PREDICTABILITY
SHIGENAGA, M
SEKIGUCHI, Y
YAMAGUCHI, T
MASUDA, R
IEICE TRANSACTIONS ON COMMUNICATIONS ELECTRONICS INFORMATION AND SYSTEMS, 1991, 74 (07): : 1817 - 1825
[23] Feature selection in mandarin large vocabulary continuous speech recognition
Zhu, X
Chen, YN
Liu, J
Liu, RS
2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, 2002, : 508 - 511
[24] DISTRIBUTED SUBMODULAR MAXIMIZATION FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
Qi, Jun
Liu, Xu
Kamijo, Shunshuke
Tejedor, Javier
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 2501 - 2505
[25] Using a transcription graph for large vocabulary continuous speech recognition
Li, Z
OShaughnessy, D
1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 121 - 124
[26] A word graph algorithm for large vocabulary continuous speech recognition
Ortmanns, S
Ney, H
Aubert, X
COMPUTER SPEECH AND LANGUAGE, 1997, 11 (01): : 43 - 72
[27] A large vocabulary continuous speech recognition system for Persian language
Hossein Sameti
Hadi Veisi
Mohammad Bahrani
Bagher Babaali
Khosro Hosseinzadeh
EURASIP Journal on Audio, Speech, and Music Processing, 2011
[28] Large-vocabulary continuous speech recognition: Advances and applications
Gauvain, JL
Lamel, L
PROCEEDINGS OF THE IEEE, 2000, 88 (08) : 1181 - 1200
[29] DEEP-FSMN FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION
Zhang, Shiliang
Lei, Ming
Yan, Zhijie
Dai, Lirong
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5869 - 5873
[30] A large-vocabulary continuous speech recognition system for Hindi
Kumar, M
Rajput, N
Verma, A
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2004, 48 (5-6) : 703 - 715

← 1 2 3 4 5 →