Multimodal Driver Interaction with Gesture, Gaze and Speech

被引：9

作者：

Aftab, Abdul Rafey ^{[1
,2
]}

机构：

[1] Univ Saarland, Saarbrucken, Germany

[2] BMW Grp, Munich, Germany

来源：

ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2019年

关键词：

Data fusion; late fusion; speech commands; eye-tracking; head pose; gesture recognition; RNN; LSTM; CNN; HEAD POSE; DIRECTION;

D O I：

10.1145/3340555.3356093

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The ever-growing research in computer vision has created new avenues for user interaction. Speech commands and gesture recognition are already being applied in various touch-based inputs. It is, therefore, foreseeable, that the use of multimodal input methods for user interaction is the next phase in development. In this paper, I propose a research plan of novel methods for the use of multimodal inputs for the semantic interpretation of human-computer interaction, specifically applied to a car driver. A fusion methodology has to be designed that adequately makes use of a recognized gesture (specifically finger pointing), eye gaze and head pose for the identification of reference objects, while using the semantics from speech for a natural interactive environment for the driver. The proposed plan includes different techniques based on artificial neural networks for the fusion of the camera-based modalities (gaze, head and gesture). It then combines features extracted from speech with the fusion algorithm to determine the intent of the driver.

引用

页码：487 / 492

页数：6

共 50 条

[41] Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis
Loic Kessous
Ginevra Castellano
George Caridakis
[J]. Journal on Multimodal User Interfaces, 2010, 3 : 33 - 48
[42] Where is this? - Gesture Based Multimodal Interaction With An Anthropomorphic Robot
Beuter, Niklas
Spexard, Thorsten
Luetkebohle, Ingo
Peltason, Julia
Kummert, Franz
[J]. 2008 8TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS 2008), 2008, : 567 - 573
[43] The Raised Index Finger gesture in Hebrew multimodal interaction
Inbar, Anna
[J]. GESTURE, 2022, 21 (2-3) : 264 - 295
[44] Multimodal Hand and Foot Gesture Interaction for Handheld Devices
Lv, Zhihan
Halawani, Alaa
Feng, Shengzhong
Li, Haibo
Rehman, Shafiq U. R.
[J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2014, 11 (01)
[45] Multimodal Corpora for Silent Speech Interaction
Freitas, Joao
Teixeira, Antonio
Dias, Miguel Sales
[J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 4507 - 4511
[46] Speech and graphical interaction in multimodal communication
Umata, I
Shimojima, A
Katagiri, Y
[J]. DIAGRAMMATIC REPRESENTATION AND INFERENCE, 2004, 2980 : 316 - 328
[47] Multimodal Interaction with Gaze and Pressure Ring in Mixed Reality
Wang, Zhimin
Sun, Jingyi
Hu, Mingwei
Rao, Maohang
Ge, Yangshi
Song, Weitao
Lu, Feng
[J]. 2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 841 - 842
[48] Gaze tracking for multimodal human-computer interaction
Stiefelhagen, R
Yang, J
[J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 2617 - 2620
[49] Multimodal language in bilingual and monolingual children: Gesture production and speech disfluency
Arslan, Burcu
Aktan-Erciyes, Asli
Goeksun, Tilbe
[J]. BILINGUALISM-LANGUAGE AND COGNITION, 2023, 26 (05) : 971 - 983
[50] Multimodal language use in Savosavo Refusing, excluding and negating with speech and gesture
Bressem, Jana
Stein, Nicole
Wegener, Claudia
[J]. PRAGMATICS, 2017, 27 (02): : 173 - 206

← 1 2 3 4 5 →