Multi-talker Verbal Interaction for Humanoid Robots

被引:0
|
作者
Klin, Bartlomiej [1 ]
Beniak, Ryszard [1 ]
Podpora, Michal [2 ]
Gardecki, Arkadiusz [1 ]
Rut, Joanna [3 ]
机构
[1] Opole Univ Technol, Fac Elect Engn Automat Control & Informat, Opole, Poland
[2] Opole Univ Technol, Dept Comp Sci, Opole, Poland
[3] Opole Univ Technol, Fac Prod Engn & Logist, Opole, Poland
来源
2024 28TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS, MMAR 2024 | 2024年
关键词
Smart beamforming; Human-Computer Interaction; Software-Hardware Integration for Robot Systems; Long-term Interaction; Multi-Modal Perception for HRI; Natural Dialog for HRI; Design and Human Factors;
D O I
10.1109/MMAR62187.2024.10680820
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Working in multi-talker mode is viable under certain conditions, such as the fusion of audio and video stimuli along with smart adaptive beamforming of received audio signals. In this article, the authors verify part of the researched novel framework, which focuses on adapting to dynamic interlocutor's location changes in the engagement zone of humanoid robots during the multi-talker conversation. After evaluating the framework, the authors confirm the necessity of a complementary and independent method of increasing the interlocutor's signal isolation accuracy. It is necessary when video analysis performance plummets. The authors described the leading cause as insufficient performance during dynamic conversations. The video analysis cannot derive a new configuration when the interlocutor's speech apparatus moves beyond the expected margin and the video frame rate drops.
引用
收藏
页码:521 / 526
页数:6
相关论文
共 50 条
  • [31] Selective cortical representation of attended speaker in multi-talker speech perception
    Mesgarani, Nima
    Chang, Edward F.
    NATURE, 2012, 485 (7397) : 233 - U118
  • [32] Real-Time Activity Detection in a Multi-Talker Reverberated Environment
    Emanuele Principi
    Rudy Rotili
    Martin Wöllmer
    Florian Eyben
    Stefano Squartini
    Björn Schuller
    Cognitive Computation, 2012, 4 : 386 - 397
  • [33] Auditory spatial cuing for speech perception in a dynamic multi-talker environment
    Tomoriova, Beata
    Kopco, Norbert
    2008 6TH INTERNATIONAL SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS, 2008, : 230 - 233
  • [34] EEG activity evoked in preparation for multi-talker listening by adults and children
    Holmes, Emma
    Kitterick, Padraig T.
    Summerfield, A. Quentin
    HEARING RESEARCH, 2016, 336 : 83 - 100
  • [35] Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
    Yin, Lu
    Wang, Ziteng
    Xia, Risheng
    Li, Junfeng
    Yan, Yonghong
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 851 - 855
  • [36] Real-Time Activity Detection in a Multi-Talker Reverberated Environment
    Principi, Emanuele
    Rotili, Rudy
    Woellmer, Martin
    Eyben, Florian
    Squartini, Stefano
    Schuller, Bjoern
    COGNITIVE COMPUTATION, 2012, 4 (04) : 386 - 397
  • [37] Text-aware Speech Separation for Multi-talker Keyword Spotting
    Li, Haoyu
    Yang, Baochen
    Xi, Yu
    Yu, Linfeng
    Tan, Tian
    Li, Hao
    Yu, Kai
    INTERSPEECH 2024, 2024, : 337 - 341
  • [38] Audio-Visual Multi-Talker Speech Recognition in A Cocktail Party
    Wu, Yifei
    Hi, Chenda
    Yang, Song
    Wu, Zhongqin
    Qian, Yanmin
    INTERSPEECH 2021, 2021, : 3021 - 3025
  • [39] Hierarchical Variational Loopy Belief Propagation for Multi-talker Speech Recognition
    Rennie, Steven J.
    Hershey, John R.
    Olsen, Peder A.
    2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 176 - 181
  • [40] Selective cortical representation of attended speaker in multi-talker speech perception
    Nima Mesgarani
    Edward F. Chang
    Nature, 2012, 485 : 233 - 236