Multi-talker Verbal Interaction for Humanoid Robots

被引：0

作者：

Klin, Bartlomiej ^{[1
]}

Beniak, Ryszard ^{[1
]}

Podpora, Michal ^{[2
]}

Gardecki, Arkadiusz ^{[1
]}

Rut, Joanna ^{[3
]}

机构：

[1] Opole Univ Technol, Fac Elect Engn Automat Control & Informat, Opole, Poland

[2] Opole Univ Technol, Dept Comp Sci, Opole, Poland

[3] Opole Univ Technol, Fac Prod Engn & Logist, Opole, Poland

来源：

2024 28TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS, MMAR 2024 | 2024年

关键词：

Smart beamforming; Human-Computer Interaction; Software-Hardware Integration for Robot Systems; Long-term Interaction; Multi-Modal Perception for HRI; Natural Dialog for HRI; Design and Human Factors;

D O I：

10.1109/MMAR62187.2024.10680820

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Working in multi-talker mode is viable under certain conditions, such as the fusion of audio and video stimuli along with smart adaptive beamforming of received audio signals. In this article, the authors verify part of the researched novel framework, which focuses on adapting to dynamic interlocutor's location changes in the engagement zone of humanoid robots during the multi-talker conversation. After evaluating the framework, the authors confirm the necessity of a complementary and independent method of increasing the interlocutor's signal isolation accuracy. It is necessary when video analysis performance plummets. The authors described the leading cause as insufficient performance during dynamic conversations. The video analysis cannot derive a new configuration when the interlocutor's speech apparatus moves beyond the expected margin and the video frame rate drops.

引用

页码：521 / 526

页数：6

共 50 条

[41] Speaker Identification in Multi-Talker Overlapping Speech Using Neural Networks
Tran, Van-Thuan
Tsai, Wei-Ho
IEEE ACCESS, 2020, 8 : 134868 - 134879
[42] Neural Virtual Microphone Estimator: Application to Multi-Talker Reverberant Mixtures
Segawa, Hanako
Ochiai, Tsubasa
Delcroix, Marc
Nakatani, Tomohiro
Ikeshita, Rintaro
Araki, Shoko
Yamada, Takeshi
Makino, Shoji
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 293 - 299
[43] Effects of face masks on speech recognition in multi-talker babble noise
Toscano, Joseph C.
Toscano, Cheyenne M.
PLOS ONE, 2021, 16 (02):
[44] Interaction of bottom-up and top-down neural mechanisms in spatial multi-talker speech perception
Patel, Prachi
van der Heijden, Kiki
Bickel, Stephan
Herrero, Jose L.
Mehta, Ashesh D.
Mesgarani, Nima
CURRENT BIOLOGY, 2022, 32 (18) : 3971 - +
[45] Single-channel multi-talker speech recognition with permutation invariant training
Qian, Yanmin
Chang, Xuankai
Yu, Dong
SPEECH COMMUNICATION, 2018, 104 : 1 - 11
[46] Monaural multi-talker speech recognition using factorial speech processing models
Khademian, Mahdi
Homayounpour, Mohammad Mehdi
SPEECH COMMUNICATION, 2018, 98 : 1 - 16
[47] Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition
Weng, Chao
Yu, Dong
Seltzer, Michael L.
Droppo, Jasha
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (10) : 1670 - 1679
[48] Super-human multi-talker speech recognition: A graphical modeling approach
Hershey, John R.
Rennie, Steven J.
Olsen, Peder A.
Kristjansson, Trausti T.
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 45 - 66
[49] Knowledge Distillation for End-to-End Monaural Multi-talker ASR System
Zhang, Wangyou
Chang, Xuankai
Qian, Yanmin
INTERSPEECH 2019, 2019, : 2633 - 2637
[50] SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition
Raj, Desh
Povey, Daniel
Khudanpur, Sanjeev
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3800 - 3813

← 1 2 3 4 5 →