Speaker-Adaptive Multimodal Prediction Model for Listener Responses

被引:5
|
作者
de Kok, Iwan [1 ]
Heylen, Dirk [1 ]
Morency, Louis-Philippe [2 ]
机构
[1] Univ Twente, Human Media Interact, Enschede, Netherlands
[2] USC Inst Creat Technol, Los Angeles, CA USA
关键词
Algorithms; Human Factors; Theory; Listener Responses; Machine Learning; Social Behavior; Multimodal; FEATURES;
D O I
10.1145/2522848.2522866
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The goal of this paper is to analyze and model the variability in speaking styles in dyadic interactions and build a predictive algorithm for listener responses that is able to adapt to these different styles. The end result of this research will be a virtual human able to automatically respond to a human speaker with proper listener responses (e.g., head nods). Our novel speaker-adaptive prediction model is created from a corpus of dyadic interactions where speaker variability is analyzed to identify a subset of prototypical speaker styles. During a live interaction our prediction model automatically identifies the closest prototypical speaker style and predicts listener responses based on this "communicative style". Central to our approach is the idea of "speaker profile" which uniquely identifies each speaker and enables the matching between prototypical speakers and new speakers. The paper shows the merits of our speaker adaptive listener response prediction model by showing improvement over a state-of-the-art approach which does not adapt to the speaker. Besides the merits of speaker-adaptation, our experiments highlights the importance of using multimodal features when comparing speakers to select the closest prototypical speaker style.
引用
收藏
页码:51 / 58
页数:8
相关论文
共 50 条
  • [1] A compact model for speaker-adaptive training
    Anastasakos, T
    McDonough, J
    Schwartz, R
    Makhoul, J
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1137 - 1140
  • [2] Integrated speaker-adaptive speech synthesis
    Wan, Moquan
    Degottex, Gilles
    Gales, Mark J. F.
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 705 - 711
  • [3] On Speaker-Independent, Speaker-Dependent, and Speaker-Adaptive Speech Recognition
    Huang, Xuedong
    Lee, Kai-Fu
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1993, 1 (02): : 150 - 157
  • [4] Comparison of Gender- and Speaker-adaptive Emotion Recognition
    Sidorov, Maxim
    Ultes, Stefan
    Schmitt, Alexander
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 3476 - 3480
  • [5] A Robust Speaker-Adaptive and Text-Prompted Speaker Verification System
    Hong, Qingyang
    Wang, Sheng
    Liu, Zhijian
    BIOMETRIC RECOGNITION (CCBR 2014), 2014, 8833 : 385 - 393
  • [6] A robust speaker-adaptive and text-prompted speaker verification system
    Hong, Qingyang, 1600, Springer Verlag (8833):
  • [7] Speaker-adaptive learning of resonance targets in a hidden trajectory model of speech coarticulation
    Yu, Dong
    Deng, Li
    Acero, Alex
    COMPUTER SPEECH AND LANGUAGE, 2007, 21 (01): : 72 - 87
  • [8] Online Incremental Learning for Speaker-Adaptive Language Models
    Hu, Chih Chi
    Liu, Bing
    Shen, John Paul
    Lane, Ian
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3363 - 3367
  • [9] Speaker-Adaptive Speech Recognition Based on Surface Electromyography
    Wand, Michael
    Schultz, Tanja
    BIOMEDICAL ENGINEERING SYSTEMS AND TECHNOLOGIES, 2010, 52 : 271 - 285
  • [10] Speaker-Adaptive Lip Reading with User-Dependent Padding
    Kim, Minsu
    Kim, Hyunjun
    Ro, Yong Man
    COMPUTER VISION, ECCV 2022, PT XXXVI, 2022, 13696 : 576 - 593