Multi-talker Verbal Interaction for Humanoid Robots

被引：0

作者：

Klin, Bartlomiej ^{[1
]}

Beniak, Ryszard ^{[1
]}

Podpora, Michal ^{[2
]}

Gardecki, Arkadiusz ^{[1
]}

Rut, Joanna ^{[3
]}

机构：

[1] Opole Univ Technol, Fac Elect Engn Automat Control & Informat, Opole, Poland

[2] Opole Univ Technol, Dept Comp Sci, Opole, Poland

[3] Opole Univ Technol, Fac Prod Engn & Logist, Opole, Poland

来源：

2024 28TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS, MMAR 2024 | 2024年

关键词：

Smart beamforming; Human-Computer Interaction; Software-Hardware Integration for Robot Systems; Long-term Interaction; Multi-Modal Perception for HRI; Natural Dialog for HRI; Design and Human Factors;

D O I：

10.1109/MMAR62187.2024.10680820

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Working in multi-talker mode is viable under certain conditions, such as the fusion of audio and video stimuli along with smart adaptive beamforming of received audio signals. In this article, the authors verify part of the researched novel framework, which focuses on adapting to dynamic interlocutor's location changes in the engagement zone of humanoid robots during the multi-talker conversation. After evaluating the framework, the authors confirm the necessity of a complementary and independent method of increasing the interlocutor's signal isolation accuracy. It is necessary when video analysis performance plummets. The authors described the leading cause as insufficient performance during dynamic conversations. The video analysis cannot derive a new configuration when the interlocutor's speech apparatus moves beyond the expected margin and the video frame rate drops.

引用

页码：521 / 526

页数：6

共 50 条

[31] Selective cortical representation of attended speaker in multi-talker speech perception
Mesgarani, Nima
Chang, Edward F.
NATURE, 2012, 485 (7397) : 233 - U118
[32] Real-Time Activity Detection in a Multi-Talker Reverberated Environment
Emanuele Principi
Rudy Rotili
Martin Wöllmer
Florian Eyben
Stefano Squartini
Björn Schuller
Cognitive Computation, 2012, 4 : 386 - 397
[33] Auditory spatial cuing for speech perception in a dynamic multi-talker environment
Tomoriova, Beata
Kopco, Norbert
2008 6TH INTERNATIONAL SYMPOSIUM ON APPLIED MACHINE INTELLIGENCE AND INFORMATICS, 2008, : 230 - 233
[34] EEG activity evoked in preparation for multi-talker listening by adults and children
Holmes, Emma
Kitterick, Padraig T.
Summerfield, A. Quentin
HEARING RESEARCH, 2016, 336 : 83 - 100
[35] Multi-talker Speech Separation Based on Permutation Invariant Training and Beamforming
Yin, Lu
Wang, Ziteng
Xia, Risheng
Li, Junfeng
Yan, Yonghong
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 851 - 855
[36] Real-Time Activity Detection in a Multi-Talker Reverberated Environment
Principi, Emanuele
Rotili, Rudy
Woellmer, Martin
Eyben, Florian
Squartini, Stefano
Schuller, Bjoern
COGNITIVE COMPUTATION, 2012, 4 (04) : 386 - 397
[37] Text-aware Speech Separation for Multi-talker Keyword Spotting
Li, Haoyu
Yang, Baochen
Xi, Yu
Yu, Linfeng
Tan, Tian
Li, Hao
Yu, Kai
INTERSPEECH 2024, 2024, : 337 - 341
[38] Audio-Visual Multi-Talker Speech Recognition in A Cocktail Party
Wu, Yifei
Hi, Chenda
Yang, Song
Wu, Zhongqin
Qian, Yanmin
INTERSPEECH 2021, 2021, : 3021 - 3025
[39] Hierarchical Variational Loopy Belief Propagation for Multi-talker Speech Recognition
Rennie, Steven J.
Hershey, John R.
Olsen, Peder A.
2009 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION & UNDERSTANDING (ASRU 2009), 2009, : 176 - 181
[40] Selective cortical representation of attended speaker in multi-talker speech perception
Nima Mesgarani
Edward F. Chang
Nature, 2012, 485 : 233 - 236

← 1 2 3 4 5 →