Multi-talker Verbal Interaction for Humanoid Robots

被引:0
|
作者
Klin, Bartlomiej [1 ]
Beniak, Ryszard [1 ]
Podpora, Michal [2 ]
Gardecki, Arkadiusz [1 ]
Rut, Joanna [3 ]
机构
[1] Opole Univ Technol, Fac Elect Engn Automat Control & Informat, Opole, Poland
[2] Opole Univ Technol, Dept Comp Sci, Opole, Poland
[3] Opole Univ Technol, Fac Prod Engn & Logist, Opole, Poland
来源
2024 28TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS, MMAR 2024 | 2024年
关键词
Smart beamforming; Human-Computer Interaction; Software-Hardware Integration for Robot Systems; Long-term Interaction; Multi-Modal Perception for HRI; Natural Dialog for HRI; Design and Human Factors;
D O I
10.1109/MMAR62187.2024.10680820
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Working in multi-talker mode is viable under certain conditions, such as the fusion of audio and video stimuli along with smart adaptive beamforming of received audio signals. In this article, the authors verify part of the researched novel framework, which focuses on adapting to dynamic interlocutor's location changes in the engagement zone of humanoid robots during the multi-talker conversation. After evaluating the framework, the authors confirm the necessity of a complementary and independent method of increasing the interlocutor's signal isolation accuracy. It is necessary when video analysis performance plummets. The authors described the leading cause as insufficient performance during dynamic conversations. The video analysis cannot derive a new configuration when the interlocutor's speech apparatus moves beyond the expected margin and the video frame rate drops.
引用
收藏
页码:521 / 526
页数:6
相关论文
共 50 条
  • [21] The effect of nearby maskers on speech intelligibility in reverberant, multi-talker environments
    Westermann, Adam
    Buchholz, Joerg M.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (03): : 2214 - 2223
  • [22] CONTINUOUS STREAMING MULTI-TALKER ASR WITH DUAL-PATH TRANSDUCERS
    Raj, Desh
    Lu, Liang
    Chen, Zhuo
    Gaur, Yashesh
    Li, Jinyu
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7317 - 7321
  • [23] Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition
    Zhang, Wangyou
    Qian, Yanmin
    INTERSPEECH 2020, 2020, : 304 - 308
  • [24] Toddlers' ability to map the meaning of new words in multi-talker environments
    Dombroski, Justine
    Newman, Rochelle S.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (05): : 2807 - 2815
  • [25] Neural indices of spoken word processing in background multi-talker babble
    Romei, Laurie
    Wambacq, Ilse J. A.
    Besing, Joan
    Koehnke, Janet
    Jerger, James
    INTERNATIONAL JOURNAL OF AUDIOLOGY, 2011, 50 (05) : 321 - 333
  • [26] ENDPOINT DETECTION FOR STREAMING END-TO-END MULTI-TALKER ASR
    Lu, Liang
    Li, Jinyu
    Gong, Yifan
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7312 - 7316
  • [27] Chinese speech identification in multi-talker babble with diotic and dichotic listening
    PENG JianXin 1
    2 Department of Architecture
    ChineseScienceBulletin, 2012, 57 (20) : 2561 - 2566
  • [28] Chinese speech identification in multi-talker babble with diotic and dichotic listening
    Peng JianXin
    Zhang HongHu
    Wang ZiYou
    CHINESE SCIENCE BULLETIN, 2012, 57 (20): : 2548 - 2553
  • [29] EFFECTS OF MULTI-TALKER COMPETING SPEECH ON THE VARIABILITY OF THE CALIFORNIA CONSONANT TEST
    SURR, RK
    SCHWARTZ, DM
    EAR AND HEARING, 1980, 1 (06): : 319 - 323
  • [30] USING BINARUAL PROCESSING FOR AUTOMATIC SPEECH RECOGNITION IN MULTI-TALKER SCENES
    Spille, Constantin
    Dietz, Mathias
    Hohmann, Volker
    Meyer, Bernd T.
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7805 - 7809