Asynchronous Multimodal Text Entry using Speech and Gesture Keyboards

被引：0

作者：

Kristensson, Per Ola ^{[1
]}

Vertanen, Keith ^{[2
]}

机构：

[1] Univ St Andrews, Sch Comp Sci, St Andrews KY16 9AJ, Fife, Scotland

[2] Princeton Univ, Dept Comp Sci, Princeton, NJ USA

来源：

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 | 2011年

基金：

英国工程与自然科学研究理事会;

关键词：

mobile text entry; multimodal interfaces;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose reducing errors in text entry by combining speech and gesture keyboard input. We describe a merge model that combines recognition results in an asynchronous and flexible manner. We collected speech and gesture data of users entering both short email sentences and web search queries. By merging recognition results from both modalities, word error rate was reduced by 53% relative for email sentences and 29% relative for web searches. For email utterances with speech errors, we investigated providing gesture keyboard corrections of only the erroneous words. Without the user explicitly indicating the incorrect words, our model was able to reduce the word error rate by 44% relative.

引用

页码：588 / +

页数：2

共 50 条

[21] Text Entry in Virtual Environments using Speech and a Midair Keyboard
Adhikary, Jiban
Vertanen, Keith
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2021, 27 (05) : 2648 - 2658
[22] This Is What's Important - Using Speech and Gesture to Create Focus in Multimodal Utterance
Freigang, Farina
Kopp, Stefan
INTELLIGENT VIRTUAL AGENTS, IVA 2016, 2016, 10011 : 96 - 109
[23] Using speech to identify gesture pen strokes in collaborative, multimodal device descriptions
Herold, James
Stahovich, Thomas F.
AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING, 2011, 25 (03): : 237 - 254
[24] Improvement of multimodal gesture and speech recognition performance using time intervals between gestures and accompanying speech
Madoka Miki
Norihide Kitaoka
Chiyomi Miyajima
Takanori Nishino
Kazuya Takeda
EURASIP Journal on Audio, Speech, and Music Processing, 2014
[25] Improvement of multimodal gesture and speech recognition performance using time intervals between gestures and accompanying speech
Miki, Madoka
Kitaoka, Norihide
Miyajima, Chiyomi
Nishino, Takanori
Takeda, Kazuya
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
[26] Exploiting speech-gesture correlation in multimodal interaction
Chen, Fang
Choi, Eric H. C.
Wang, Ning
HUMAN-COMPUTER INTERACTION, PT 3, PROCEEDINGS, 2007, 4552 : 23 - +
[27] Multimodal encoding of motion events in speech, gesture and cognition
Unal, Ercenur
Mamus, Ezgi
Ozyurek, Asli
LANGUAGE AND COGNITION, 2024, 16 (04) : 785 - 804
[28] Text Entry in Immersive Head-Mounted Display-Based Virtual Reality Using Standard Keyboards
Grubert, Jens
Witzani, Lukas
Ofek, Eyal
Pahud, Michel
Kranz, Matthias
Kristensson, Per Ola
25TH 2018 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES (VR), 2018, : 159 - 166
[29] Emotion Classification from Speech and Text in Videos Using a Multimodal Approach
Caschera, Maria Chiara
Grifoni, Patrizia
Ferri, Fernando
MULTIMODAL TECHNOLOGIES AND INTERACTION, 2022, 6 (04)
[30] Speech recognition for command entry in multimodal interaction
Tyfa, DA
Howes, M
INTERNATIONAL JOURNAL OF HUMAN-COMPUTER STUDIES, 2000, 52 (04) : 637 - 667

← 1 2 3 4 5 →