Multimodal Driver Interaction with Gesture, Gaze and Speech

被引：9

作者：

Aftab, Abdul Rafey ^{[1
,2
]}

机构：

[1] Univ Saarland, Saarbrucken, Germany

[2] BMW Grp, Munich, Germany

来源：

ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2019年

关键词：

Data fusion; late fusion; speech commands; eye-tracking; head pose; gesture recognition; RNN; LSTM; CNN; HEAD POSE; DIRECTION;

D O I：

10.1145/3340555.3356093

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

The ever-growing research in computer vision has created new avenues for user interaction. Speech commands and gesture recognition are already being applied in various touch-based inputs. It is, therefore, foreseeable, that the use of multimodal input methods for user interaction is the next phase in development. In this paper, I propose a research plan of novel methods for the use of multimodal inputs for the semantic interpretation of human-computer interaction, specifically applied to a car driver. A fusion methodology has to be designed that adequately makes use of a recognized gesture (specifically finger pointing), eye gaze and head pose for the identification of reference objects, while using the semantics from speech for a natural interactive environment for the driver. The proposed plan includes different techniques based on artificial neural networks for the fusion of the camera-based modalities (gaze, head and gesture). It then combines features extracted from speech with the fusion algorithm to determine the intent of the driver.

引用

页码：487 / 492

页数：6

共 50 条

[1] Multimodal Interaction with Gaze and Controller Gesture
Chia, Wen Han
Cai, Yiyu
Ho, Andrew
[J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY ADJUNCT (ISMAR-ADJUNCT 2022), 2022, : 518 - 523
[2] Exploiting speech-gesture correlation in multimodal interaction
Chen, Fang
Choi, Eric H. C.
Wang, Ning
[J]. HUMAN-COMPUTER INTERACTION, PT 3, PROCEEDINGS, 2007, 4552 : 23 - +
[3] Interaction without gesture or speech - A gaze controlled AR system
Nilsson, Susanna
[J]. 17TH INTERNATIONAL CONFERENCE ON ARTIFICIAL REALITY AND TELEXISTENCE, ICAT 2007, PROCEEDINGS, 2007, : 280 - 281
[4] Interaction With Gaze, Gesture, and Speech in a Flexibly Configurable Augmented Reality System
Wang, Zhimin
Wang, Haofei
Yu, Huangyue
Lu, Feng
[J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2021, 51 (05) : 524 - 534
[5] A gaze and speech multimodal interface
Zhang, QH
Imamiya, A
Go, K
Mao, XY
[J]. 24TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS, PROCEEDINGS, 2004, : 208 - 213
[6] A Combination of Static and Stroke Gesture with Speech for Multimodal Interaction in a Virtual Environment
Chun, Lam Meng
Arshad, Haslina
Piumsomboon, Thammathip
Billinghurst, Mark
[J]. 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS 2015, 2015, : 59 - 64
[7] Research on multimodal human-robot interaction based on speech and gesture
Deng Yongda
Li Fang
Xin Huang
[J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 72 : 443 - 454
[8] Speech to Head Gesture Mapping in Multimodal Human-Robot Interaction
Aly, Amir
Tapus, Adriana
[J]. SERVICE ORIENTATION IN HOLONIC AND MULTI-AGENT MANUFACTURING CONTROL, 2012, 402 : 183 - 196
[9] The importance of gaze and gesture in interactive multimodal explanation
Kristine Lund
[J]. Language Resources and Evaluation, 2007, 41 : 289 - 303
[10] The importance of gaze and gesture in interactive multimodal explanation
Lund, Kristine
[J]. LANGUAGE RESOURCES AND EVALUATION, 2007, 41 (3-4) : 289 - 303

← 1 2 3 4 5 →