Multimodal Driver Interaction with Gesture, Gaze and Speech

被引:9
|
作者
Aftab, Abdul Rafey [1 ,2 ]
机构
[1] Univ Saarland, Saarbrucken, Germany
[2] BMW Grp, Munich, Germany
关键词
Data fusion; late fusion; speech commands; eye-tracking; head pose; gesture recognition; RNN; LSTM; CNN; HEAD POSE; DIRECTION;
D O I
10.1145/3340555.3356093
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The ever-growing research in computer vision has created new avenues for user interaction. Speech commands and gesture recognition are already being applied in various touch-based inputs. It is, therefore, foreseeable, that the use of multimodal input methods for user interaction is the next phase in development. In this paper, I propose a research plan of novel methods for the use of multimodal inputs for the semantic interpretation of human-computer interaction, specifically applied to a car driver. A fusion methodology has to be designed that adequately makes use of a recognized gesture (specifically finger pointing), eye gaze and head pose for the identification of reference objects, while using the semantics from speech for a natural interactive environment for the driver. The proposed plan includes different techniques based on artificial neural networks for the fusion of the camera-based modalities (gaze, head and gesture). It then combines features extracted from speech with the fusion algorithm to determine the intent of the driver.
引用
收藏
页码:487 / 492
页数:6
相关论文
共 50 条
  • [1] Multimodal Interaction with Gaze and Controller Gesture
    Chia, Wen Han
    Cai, Yiyu
    Ho, Andrew
    [J]. 2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY ADJUNCT (ISMAR-ADJUNCT 2022), 2022, : 518 - 523
  • [2] Exploiting speech-gesture correlation in multimodal interaction
    Chen, Fang
    Choi, Eric H. C.
    Wang, Ning
    [J]. HUMAN-COMPUTER INTERACTION, PT 3, PROCEEDINGS, 2007, 4552 : 23 - +
  • [3] Interaction without gesture or speech - A gaze controlled AR system
    Nilsson, Susanna
    [J]. 17TH INTERNATIONAL CONFERENCE ON ARTIFICIAL REALITY AND TELEXISTENCE, ICAT 2007, PROCEEDINGS, 2007, : 280 - 281
  • [4] Interaction With Gaze, Gesture, and Speech in a Flexibly Configurable Augmented Reality System
    Wang, Zhimin
    Wang, Haofei
    Yu, Huangyue
    Lu, Feng
    [J]. IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, 2021, 51 (05) : 524 - 534
  • [5] A gaze and speech multimodal interface
    Zhang, QH
    Imamiya, A
    Go, K
    Mao, XY
    [J]. 24TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS WORKSHOPS, PROCEEDINGS, 2004, : 208 - 213
  • [6] A Combination of Static and Stroke Gesture with Speech for Multimodal Interaction in a Virtual Environment
    Chun, Lam Meng
    Arshad, Haslina
    Piumsomboon, Thammathip
    Billinghurst, Mark
    [J]. 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND INFORMATICS 2015, 2015, : 59 - 64
  • [7] Research on multimodal human-robot interaction based on speech and gesture
    Deng Yongda
    Li Fang
    Xin Huang
    [J]. COMPUTERS & ELECTRICAL ENGINEERING, 2018, 72 : 443 - 454
  • [8] Speech to Head Gesture Mapping in Multimodal Human-Robot Interaction
    Aly, Amir
    Tapus, Adriana
    [J]. SERVICE ORIENTATION IN HOLONIC AND MULTI-AGENT MANUFACTURING CONTROL, 2012, 402 : 183 - 196
  • [9] The importance of gaze and gesture in interactive multimodal explanation
    Kristine Lund
    [J]. Language Resources and Evaluation, 2007, 41 : 289 - 303
  • [10] The importance of gaze and gesture in interactive multimodal explanation
    Lund, Kristine
    [J]. LANGUAGE RESOURCES AND EVALUATION, 2007, 41 (3-4) : 289 - 303