Jointly Optimizing Sensing Pipelines for Multimodal Mixed Reality Interaction

被引:3
|
作者
Rathnayake, Darshana [1 ]
de Silva, Ashen [2 ]
Puwakdandawa, Dasun [2 ]
Meegahapola, Lakmal [3 ,4 ]
Misra, Archan [1 ]
Perera, Indika [2 ]
机构
[1] Singapore Management Univ, Singapore, Singapore
[2] Univ Moratuwa, Moratuwa, Sri Lanka
[3] Idiap Res Inst, Martigny, Switzerland
[4] Ecole Polytech Fed Lausanne EPFL, Lausanne, Switzerland
基金
新加坡国家研究基金会;
关键词
sensor fusion; mixed reality; multimodal interactions;
D O I
10.1109/MASS50613.2020.00046
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Natural human interactions for Mixed Reality Applications are overwhelmingly multimodal: humans communicate intent and instructions via a combination of visual, aural and gestural cues. However, supporting low-latency and accurate comprehension of such multimodal instructions (MMI), on resource-constrained wearable devices, remains an open chal-lenge, especially as the stale-of-the-art comprehension techniques for each individual modality increasingly utilize complex Deep Neural Network models. We demonstrate the possibility of overcoming the core limitation of latency-vs.-accuracy tradeoff by exploiting cross-modal dependencies-i.e., by compensating for the interior performance of one model with an increased accuracy of more complex model of a different modality. We present a sensor fusion architecture that performs MMI comprehension in a quasi-synchronous fashion, by fusing visual, speech and gestural input. The architecture is reconfigurable and supports dynamic modification of the complexity of the data processing pipeline for each individual modality in response to contextual changes. Using a representative "classroom" context and a set of tour conunon interaction primitives, we then demonstrate how the choices between low and high complexity models for each individual modality are coupled. In particular, we show that (a) a judicious combination of low and high complexity models across modalities can offer a dramatic 3-fold decrease in comprehension latency together with an increase similar to 10-15% in accuracy, and (b) the right collective choice of models is context dependent, with the performance of some model combinations being significantly more sensitive to changes in scene context or choice of interaction.
引用
收藏
页码:309 / 317
页数:9
相关论文
共 50 条
  • [1] Multimodal Interaction with Gaze and Pressure Ring in Mixed Reality
    Wang, Zhimin
    Sun, Jingyi
    Hu, Mingwei
    Rao, Maohang
    Ge, Yangshi
    Song, Weitao
    Lu, Feng
    [J]. 2024 IEEE CONFERENCE ON VIRTUAL REALITY AND 3D USER INTERFACES ABSTRACTS AND WORKSHOPS, VRW 2024, 2024, : 841 - 842
  • [2] Exploring Multimodal Interaction Techniques for a Mixed Reality Digital Surface
    Fischbach, Martin
    Zimmerer, Chris
    Giebler-Schubert, Anke
    Latoschik, Marc Erich
    [J]. 2014 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY (ISMAR) - SCIENCE AND TECHNOLOGY, 2014, : 335 - 336
  • [3] Multimodal Interaction in Augmented Reality
    Chen, Zhaorui
    Li, Jinzhou
    Hua, Yifan
    Shen, Rui
    Basu, Anup
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2017, : 206 - 209
  • [4] Teaching System for Multimodal Object Categorization by Human-Robot Interaction in Mixed Reality
    El Hafi, Lotfi
    Nakamura, Hitoshi
    Taniguchi, Akira
    Hagiwara, Yoshinobu
    Taniguchi, Tadahiro
    [J]. 2021 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION (SII), 2021, : 320 - 324
  • [5] Multimodal acting in mixed reality interactive storytelling
    Cavazza, M
    Charles, F
    Mead, SJ
    Martin, O
    Marichal, X
    Nandi, A
    [J]. IEEE MULTIMEDIA, 2004, 11 (03) : 30 - 39
  • [6] Design Experiences of Multimodal Mixed Reality Interfaces
    Liarokapis, Fotis
    Newman, Robert
    [J]. SIGDOC'07: PROCEEDINGS OF THE 25TH ACM INTERNATIONAL CONFERENCE ON DESIGN OF COMMUNICATION, 2007, : 34 - 41
  • [7] Multimodal Drumming Education Tool in Mixed Reality
    Pinkl, James
    Villegas, Julian
    Cohen, Michael
    [J]. MULTIMODAL TECHNOLOGIES AND INTERACTION, 2024, 8 (08)
  • [8] Mixed Reality and Artificial Intelligence: A Holistic Approach to Multimodal Visualization and Extended Interaction in Knee Osteotomy
    Moglia, Andrea
    Marsilio, Luca
    Rossi, Matteo
    Pinelli, Maria
    Lettieri, Emanuele
    Mainardi, Luca
    Manzotti, Alfonso
    Cerveri, Pietro
    [J]. IEEE JOURNAL OF TRANSLATIONAL ENGINEERING IN HEALTH AND MEDICINE, 2024, 12 : 279 - 290
  • [9] Intelligent interaction in mixed reality
    Shi, Yuanchun
    Yu, Chun
    [J]. Virtual Reality and Intelligent Hardware, 2022, 4 (02):
  • [10] Intelligent interaction in mixed reality
    Yuanchun SHI
    Chun YU
    [J]. 虚拟现实与智能硬件(中英文), 2022, (02) : 87 - 88