Speaker-Adaptive Multimodal Prediction Model for Listener Responses

被引:5
|
作者
de Kok, Iwan [1 ]
Heylen, Dirk [1 ]
Morency, Louis-Philippe [2 ]
机构
[1] Univ Twente, Human Media Interact, Enschede, Netherlands
[2] USC Inst Creat Technol, Los Angeles, CA USA
关键词
Algorithms; Human Factors; Theory; Listener Responses; Machine Learning; Social Behavior; Multimodal; FEATURES;
D O I
10.1145/2522848.2522866
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The goal of this paper is to analyze and model the variability in speaking styles in dyadic interactions and build a predictive algorithm for listener responses that is able to adapt to these different styles. The end result of this research will be a virtual human able to automatically respond to a human speaker with proper listener responses (e.g., head nods). Our novel speaker-adaptive prediction model is created from a corpus of dyadic interactions where speaker variability is analyzed to identify a subset of prototypical speaker styles. During a live interaction our prediction model automatically identifies the closest prototypical speaker style and predicts listener responses based on this "communicative style". Central to our approach is the idea of "speaker profile" which uniquely identifies each speaker and enables the matching between prototypical speakers and new speakers. The paper shows the merits of our speaker adaptive listener response prediction model by showing improvement over a state-of-the-art approach which does not adapt to the speaker. Besides the merits of speaker-adaptation, our experiments highlights the importance of using multimodal features when comparing speakers to select the closest prototypical speaker style.
引用
收藏
页码:51 / 58
页数:8
相关论文
共 50 条
  • [41] Emotional Voice Conversion Using a Hybrid Framework With Speaker-Adaptive DNN and Particle-Swarm-Optimized Neural Network
    Vekkot, Susmitha
    Gupta, Deepa
    Zakariah, Mohammed
    Alotaibi, Yousef Ajami
    IEEE ACCESS, 2020, 8 : 74627 - 74647
  • [42] Speaker-Adaptive Speech Synthesis Based on Eigenvoice Conversion and Language-Dependent Prosodic Conversion in Speech-to-Speech Translation
    Hattori, Nobuhiko
    Toda, Tomoki
    Kawai, Hisashi
    Saruwatari, Hiroshi
    Shikano, Kiyohiro
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2780 - +
  • [43] Deep learning-based speaker-adaptive postfiltering with limited adaptation data for embedded text-to-speech synthesis systems
    Eren, Eray
    Demiroglu, Cenk
    COMPUTER SPEECH AND LANGUAGE, 2023, 81
  • [44] Adaptive Individual Background Model for Speaker Verification
    Bar-Yosef, Yossi
    Bistritz, Yuval
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1279 - 1282
  • [45] Multimodal speaker identification using an adaptive classifier cascade based on modality reliability
    Erzin, E
    Yemez, Y
    Tekalp, AM
    IEEE TRANSACTIONS ON MULTIMEDIA, 2005, 7 (05) : 840 - 852
  • [46] Engagement recognition by a latent character model based on multimodal listener behaviors in spoken dialogue
    Inoue, Koji
    Lala, Divesh
    Takanashi, Katsuya
    Kawahara, Tatsuya
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2018, 7 (07)
  • [47] Adaptive Multilevel Prediction Method for Dynamic Multimodal Optimization
    Ahrari, Ali
    Elsayed, Saber
    Sarker, Ruhul
    Essam, Daryl
    Coello Coello, Carlos A.
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2021, 25 (03) : 463 - 477
  • [48] A multimodal model for protein function prediction
    Yu Mao
    WenHui Xu
    Yue Shun
    LongXin Chai
    Lei Xue
    Yong Yang
    Mei Li
    Scientific Reports, 15 (1)
  • [49] 4D Multimodal Speaker Model for Remote Speech Diagnosis
    Krecichwost, Michal
    Sage, Agata
    Miodonska, Zuzanna
    Badura, Pawel
    IEEE ACCESS, 2022, 10 : 93187 - 93202
  • [50] Who Speaks Next? Turn Change and Next Speaker Prediction in Multimodal Multiparty Interaction
    Malik, Usman
    Saunier, Julien
    Funakoshi, Kotaro
    Pauchet, Alexandre
    2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, : 349 - 354