Multi-talker Verbal Interaction for Humanoid Robots

被引：0

作者：

Klin, Bartlomiej ^{[1
]}

Beniak, Ryszard ^{[1
]}

Podpora, Michal ^{[2
]}

Gardecki, Arkadiusz ^{[1
]}

Rut, Joanna ^{[3
]}

机构：

[1] Opole Univ Technol, Fac Elect Engn Automat Control & Informat, Opole, Poland

[2] Opole Univ Technol, Dept Comp Sci, Opole, Poland

[3] Opole Univ Technol, Fac Prod Engn & Logist, Opole, Poland

来源：

2024 28TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS, MMAR 2024 | 2024年

关键词：

Smart beamforming; Human-Computer Interaction; Software-Hardware Integration for Robot Systems; Long-term Interaction; Multi-Modal Perception for HRI; Natural Dialog for HRI; Design and Human Factors;

D O I：

10.1109/MMAR62187.2024.10680820

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Working in multi-talker mode is viable under certain conditions, such as the fusion of audio and video stimuli along with smart adaptive beamforming of received audio signals. In this article, the authors verify part of the researched novel framework, which focuses on adapting to dynamic interlocutor's location changes in the engagement zone of humanoid robots during the multi-talker conversation. After evaluating the framework, the authors confirm the necessity of a complementary and independent method of increasing the interlocutor's signal isolation accuracy. It is necessary when video analysis performance plummets. The authors described the leading cause as insufficient performance during dynamic conversations. The video analysis cannot derive a new configuration when the interlocutor's speech apparatus moves beyond the expected margin and the video frame rate drops.

引用

页码：521 / 526

页数：6

共 50 条

[21] The effect of nearby maskers on speech intelligibility in reverberant, multi-talker environments
Westermann, Adam
Buchholz, Joerg M.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 141 (03): : 2214 - 2223
[22] CONTINUOUS STREAMING MULTI-TALKER ASR WITH DUAL-PATH TRANSDUCERS
Raj, Desh
Lu, Liang
Chen, Zhuo
Gaur, Yashesh
Li, Jinyu
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7317 - 7321
[23] Learning Contextual Language Embeddings for Monaural Multi-talker Speech Recognition
Zhang, Wangyou
Qian, Yanmin
INTERSPEECH 2020, 2020, : 304 - 308
[24] Toddlers' ability to map the meaning of new words in multi-talker environments
Dombroski, Justine
Newman, Rochelle S.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (05): : 2807 - 2815
[25] Neural indices of spoken word processing in background multi-talker babble
Romei, Laurie
Wambacq, Ilse J. A.
Besing, Joan
Koehnke, Janet
Jerger, James
INTERNATIONAL JOURNAL OF AUDIOLOGY, 2011, 50 (05) : 321 - 333
[26] ENDPOINT DETECTION FOR STREAMING END-TO-END MULTI-TALKER ASR
Lu, Liang
Li, Jinyu
Gong, Yifan
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7312 - 7316
[27] Chinese speech identification in multi-talker babble with diotic and dichotic listening
PENG JianXin 1
2 Department of Architecture
ChineseScienceBulletin, 2012, 57 (20) : 2561 - 2566
[28] Chinese speech identification in multi-talker babble with diotic and dichotic listening
Peng JianXin
Zhang HongHu
Wang ZiYou
CHINESE SCIENCE BULLETIN, 2012, 57 (20): : 2548 - 2553
[29] EFFECTS OF MULTI-TALKER COMPETING SPEECH ON THE VARIABILITY OF THE CALIFORNIA CONSONANT TEST
SURR, RK
SCHWARTZ, DM
EAR AND HEARING, 1980, 1 (06): : 319 - 323
[30] USING BINARUAL PROCESSING FOR AUTOMATIC SPEECH RECOGNITION IN MULTI-TALKER SCENES
Spille, Constantin
Dietz, Mathias
Hohmann, Volker
Meyer, Bernd T.
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7805 - 7809

← 1 2 3 4 5 →