Multi-talker Verbal Interaction for Humanoid Robots

被引:0
|
作者
Klin, Bartlomiej [1 ]
Beniak, Ryszard [1 ]
Podpora, Michal [2 ]
Gardecki, Arkadiusz [1 ]
Rut, Joanna [3 ]
机构
[1] Opole Univ Technol, Fac Elect Engn Automat Control & Informat, Opole, Poland
[2] Opole Univ Technol, Dept Comp Sci, Opole, Poland
[3] Opole Univ Technol, Fac Prod Engn & Logist, Opole, Poland
来源
2024 28TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS, MMAR 2024 | 2024年
关键词
Smart beamforming; Human-Computer Interaction; Software-Hardware Integration for Robot Systems; Long-term Interaction; Multi-Modal Perception for HRI; Natural Dialog for HRI; Design and Human Factors;
D O I
10.1109/MMAR62187.2024.10680820
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Working in multi-talker mode is viable under certain conditions, such as the fusion of audio and video stimuli along with smart adaptive beamforming of received audio signals. In this article, the authors verify part of the researched novel framework, which focuses on adapting to dynamic interlocutor's location changes in the engagement zone of humanoid robots during the multi-talker conversation. After evaluating the framework, the authors confirm the necessity of a complementary and independent method of increasing the interlocutor's signal isolation accuracy. It is necessary when video analysis performance plummets. The authors described the leading cause as insufficient performance during dynamic conversations. The video analysis cannot derive a new configuration when the interlocutor's speech apparatus moves beyond the expected margin and the video frame rate drops.
引用
收藏
页码:521 / 526
页数:6
相关论文
共 50 条
  • [41] Speaker Identification in Multi-Talker Overlapping Speech Using Neural Networks
    Tran, Van-Thuan
    Tsai, Wei-Ho
    IEEE ACCESS, 2020, 8 : 134868 - 134879
  • [42] Neural Virtual Microphone Estimator: Application to Multi-Talker Reverberant Mixtures
    Segawa, Hanako
    Ochiai, Tsubasa
    Delcroix, Marc
    Nakatani, Tomohiro
    Ikeshita, Rintaro
    Araki, Shoko
    Yamada, Takeshi
    Makino, Shoji
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 293 - 299
  • [43] Effects of face masks on speech recognition in multi-talker babble noise
    Toscano, Joseph C.
    Toscano, Cheyenne M.
    PLOS ONE, 2021, 16 (02):
  • [44] Interaction of bottom-up and top-down neural mechanisms in spatial multi-talker speech perception
    Patel, Prachi
    van der Heijden, Kiki
    Bickel, Stephan
    Herrero, Jose L.
    Mehta, Ashesh D.
    Mesgarani, Nima
    CURRENT BIOLOGY, 2022, 32 (18) : 3971 - +
  • [45] Single-channel multi-talker speech recognition with permutation invariant training
    Qian, Yanmin
    Chang, Xuankai
    Yu, Dong
    SPEECH COMMUNICATION, 2018, 104 : 1 - 11
  • [46] Monaural multi-talker speech recognition using factorial speech processing models
    Khademian, Mahdi
    Homayounpour, Mohammad Mehdi
    SPEECH COMMUNICATION, 2018, 98 : 1 - 16
  • [47] Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition
    Weng, Chao
    Yu, Dong
    Seltzer, Michael L.
    Droppo, Jasha
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (10) : 1670 - 1679
  • [48] Super-human multi-talker speech recognition: A graphical modeling approach
    Hershey, John R.
    Rennie, Steven J.
    Olsen, Peder A.
    Kristjansson, Trausti T.
    COMPUTER SPEECH AND LANGUAGE, 2010, 24 (01): : 45 - 66
  • [49] Knowledge Distillation for End-to-End Monaural Multi-talker ASR System
    Zhang, Wangyou
    Chang, Xuankai
    Qian, Yanmin
    INTERSPEECH 2019, 2019, : 2633 - 2637
  • [50] SURT 2.0: Advances in Transducer-Based Multi-Talker Speech Recognition
    Raj, Desh
    Povey, Daniel
    Khudanpur, Sanjeev
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3800 - 3813