Multimodal Driver Interaction with Gesture, Gaze and Speech

被引:9
|
作者
Aftab, Abdul Rafey [1 ,2 ]
机构
[1] Univ Saarland, Saarbrucken, Germany
[2] BMW Grp, Munich, Germany
关键词
Data fusion; late fusion; speech commands; eye-tracking; head pose; gesture recognition; RNN; LSTM; CNN; HEAD POSE; DIRECTION;
D O I
10.1145/3340555.3356093
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The ever-growing research in computer vision has created new avenues for user interaction. Speech commands and gesture recognition are already being applied in various touch-based inputs. It is, therefore, foreseeable, that the use of multimodal input methods for user interaction is the next phase in development. In this paper, I propose a research plan of novel methods for the use of multimodal inputs for the semantic interpretation of human-computer interaction, specifically applied to a car driver. A fusion methodology has to be designed that adequately makes use of a recognized gesture (specifically finger pointing), eye gaze and head pose for the identification of reference objects, while using the semantics from speech for a natural interactive environment for the driver. The proposed plan includes different techniques based on artificial neural networks for the fusion of the camera-based modalities (gaze, head and gesture). It then combines features extracted from speech with the fusion algorithm to determine the intent of the driver.
引用
收藏
页码:487 / 492
页数:6
相关论文
共 50 条
  • [41] Multimodal emotion recognition in speech-based interaction using facial expression, body gesture and acoustic analysis
    Loic Kessous
    Ginevra Castellano
    George Caridakis
    [J]. Journal on Multimodal User Interfaces, 2010, 3 : 33 - 48
  • [42] Where is this? - Gesture Based Multimodal Interaction With An Anthropomorphic Robot
    Beuter, Niklas
    Spexard, Thorsten
    Luetkebohle, Ingo
    Peltason, Julia
    Kummert, Franz
    [J]. 2008 8TH IEEE-RAS INTERNATIONAL CONFERENCE ON HUMANOID ROBOTS (HUMANOIDS 2008), 2008, : 567 - 573
  • [43] The Raised Index Finger gesture in Hebrew multimodal interaction
    Inbar, Anna
    [J]. GESTURE, 2022, 21 (2-3) : 264 - 295
  • [44] Multimodal Hand and Foot Gesture Interaction for Handheld Devices
    Lv, Zhihan
    Halawani, Alaa
    Feng, Shengzhong
    Li, Haibo
    Rehman, Shafiq U. R.
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2014, 11 (01)
  • [45] Multimodal Corpora for Silent Speech Interaction
    Freitas, Joao
    Teixeira, Antonio
    Dias, Miguel Sales
    [J]. LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 4507 - 4511
  • [46] Speech and graphical interaction in multimodal communication
    Umata, I
    Shimojima, A
    Katagiri, Y
    [J]. DIAGRAMMATIC REPRESENTATION AND INFERENCE, 2004, 2980 : 316 - 328
  • [47] Multimodal Interaction with Gaze and Pressure Ring in Mixed Reality
    Wang, Zhimin
    Sun, Jingyi
    Hu, Mingwei
    Rao, Maohang
    Ge, Yangshi
    Song, Weitao
    Lu, Feng
    [J]. 2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 841 - 842
  • [48] Gaze tracking for multimodal human-computer interaction
    Stiefelhagen, R
    Yang, J
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 2617 - 2620
  • [49] Multimodal language in bilingual and monolingual children: Gesture production and speech disfluency
    Arslan, Burcu
    Aktan-Erciyes, Asli
    Goeksun, Tilbe
    [J]. BILINGUALISM-LANGUAGE AND COGNITION, 2023, 26 (05) : 971 - 983
  • [50] Multimodal language use in Savosavo Refusing, excluding and negating with speech and gesture
    Bressem, Jana
    Stein, Nicole
    Wegener, Claudia
    [J]. PRAGMATICS, 2017, 27 (02): : 173 - 206