Speaker-Adaptive Multimodal Prediction Model for Listener Responses

被引:5
|
作者
de Kok, Iwan [1 ]
Heylen, Dirk [1 ]
Morency, Louis-Philippe [2 ]
机构
[1] Univ Twente, Human Media Interact, Enschede, Netherlands
[2] USC Inst Creat Technol, Los Angeles, CA USA
关键词
Algorithms; Human Factors; Theory; Listener Responses; Machine Learning; Social Behavior; Multimodal; FEATURES;
D O I
10.1145/2522848.2522866
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The goal of this paper is to analyze and model the variability in speaking styles in dyadic interactions and build a predictive algorithm for listener responses that is able to adapt to these different styles. The end result of this research will be a virtual human able to automatically respond to a human speaker with proper listener responses (e.g., head nods). Our novel speaker-adaptive prediction model is created from a corpus of dyadic interactions where speaker variability is analyzed to identify a subset of prototypical speaker styles. During a live interaction our prediction model automatically identifies the closest prototypical speaker style and predicts listener responses based on this "communicative style". Central to our approach is the idea of "speaker profile" which uniquely identifies each speaker and enables the matching between prototypical speakers and new speakers. The paper shows the merits of our speaker adaptive listener response prediction model by showing improvement over a state-of-the-art approach which does not adapt to the speaker. Besides the merits of speaker-adaptation, our experiments highlights the importance of using multimodal features when comparing speakers to select the closest prototypical speaker style.
引用
收藏
页码:51 / 58
页数:8
相关论文
共 50 条
  • [11] TOWARDS SPEAKER-ADAPTIVE SPEECH RECOGNITION BASED ON SURFACE ELECTROMYOGRAPHY
    Wand, Michael
    Schultz, Tanja
    BIOSIGNALS 2009: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON BIO-INSPIRED SYSTEMS AND SIGNAL PROCESSING, 2009, : 155 - 162
  • [12] Speaker-adaptive visual speech synthesis in the HMM-framework
    Schabus, Dietmar
    Pucher, Michael
    Hofer, Gregor
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 978 - 981
  • [13] EMOTIONS ARE A PERSONAL THING: TOWARDS SPEAKER-ADAPTIVE EMOTION RECOGNITION
    Sidorov, Maxim
    Ultes, Stefan
    Schmitt, Alexander
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [14] Speaker-Adaptive Neural Vocoders for Parametric Speech Synthesis Systems
    Song, Eunwoo
    Kim, Jin-Seob
    Byun, Kyungguen
    Kang, Hong-Goo
    2020 IEEE 22ND INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2020,
  • [15] Comparing Speaker-Dependent and Speaker-Adaptive Acoustic Models for Recognizing Dysarthric Speech
    Rudzicz, Frank
    ASSETS'07: PROCEEDINGS OF THE NINTH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, 2007, : 255 - 256
  • [16] Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives
    Cerva, Petr
    Silovsky, Jan
    Zdansky, Jindrich
    Nouza, Jan
    Seps, Ladislav
    SPEECH COMMUNICATION, 2013, 55 (10) : 1033 - 1046
  • [17] Articulatory differences between oral and nasal vowels based on simulation of a speaker-adaptive articulatory model
    Rong, Panying
    Shosted, Ryan
    Kuehn, David
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 2697 - 2700
  • [18] The Integration of Speaker and Listener Responses: A Theory of Verbal Development
    R. Douglas Greer
    JeanneMarie Speckman
    The Psychological Record, 2009, 59 : 449 - 488
  • [19] THE INTEGRATION OF SPEAKER AND LISTENER RESPONSES: A THEORY OF VERBAL DEVELOPMENT
    Greer, R. Douglas
    Speckman, JeanneMarie
    PSYCHOLOGICAL RECORD, 2009, 59 (03): : 449 - 488
  • [20] Roles of the Average Voice in Speaker-adaptive HMM-based Speech Synthesis
    Yamagishi, Junichi
    Watts, Oliver
    King, Simon
    Usabaev, Bela
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 418 - +