A multimodal approach for modeling engagement in conversation

被引:2
|
作者
Pellet-Rostaing, Arthur [1 ,2 ]
Bertrand, Roxane [1 ,2 ]
Boudin, Auriane [1 ,2 ]
Rauzy, Stephane [1 ,2 ]
Blache, Philippe [1 ,2 ]
机构
[1] CNRS, Lab Parole & Langage LPL, Aix En Provence, France
[2] Inst Language Commun & Brain ILCB, Marseille, France
来源
关键词
engagement model; multimodality; conversational skills; conversational agents; engagement classification; annotated corpora; USER ENGAGEMENT; HUMANS;
D O I
10.3389/fcomp.2023.1062342
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Recently, engagement has emerged as a key variable explaining the success of conversation. In the perspective of human-machine interaction, an automatic assessment of engagement becomes crucial to better understand the dynamics of an interaction and to design socially-aware robots. This paper presents a predictive model of the level of engagement in conversations. It shows in particular the interest of using a rich multimodal set of features, outperforming the existing models in this domain. In terms of methodology, study is based on two audio-visual corpora of naturalistic face-to-face interactions. These resources have been enriched with various annotations of verbal and nonverbal behaviors, such as smiles, head nods, and feedbacks. In addition, we manually annotated gestures intensity. Based on a review of previous works in psychology and human-machine interaction, we propose a new definition of the notion of engagement, adequate for the description of this phenomenon both in natural and mediated environments. This definition have been implemented in our annotation scheme. In our work, engagement is studied at the turn level, known to be crucial for the organization of the conversation. Even though there is still a lack of consensus around their precise definition, we have developed a turn detection tool. A multimodal characterization of engagement is performed using a multi-level classification of turns. We claim a set of multimodal cues, involving prosodic, mimo-gestural and morpho-syntactic information, is relevant to characterize the level of engagement of speakers in conversation. Our results significantly outperform the baseline and reach state-of-the-art level (0.76 weighted F-score). The most contributing modalities are identified by testing the performance of a two-layer perceptron when trained on unimodal feature sets and on combinations of two to four modalities. These results support our claim about multimodality: combining features related to the speech fundamental frequency and energy with mimo-gestural features leads to the best performance.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Understanding the Creative Conversation: Modeling to Engagement
    Shamma, David A.
    Perkel, Dan
    Luther, Kurt
    [J]. C & C 09: PROCEEDINGS OF THE 2009 ACM SIGCHI CONFERENCE ON CREATIVITY AND COGNITION, 2009, : 491 - 492
  • [2] Scientific modeling and translanguaging: A multilingual and multimodal approach to support science learning and engagement
    Pierson, Ashlyn E.
    Clark, Douglas B.
    Brady, Corey E.
    [J]. SCIENCE EDUCATION, 2021, 105 (04) : 776 - 813
  • [3] DCTM: Dilated Convolutional Transformer Model for Multimodal Engagement Estimation in Conversation
    Ngoc Tu Vu
    Van Thong Huynh
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Nawaz, Shah
    Nandakumar, Karthik
    Zaheer, M. Zaigham
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9521 - 9525
  • [4] Modeling Hierarchical Uncertainty for Multimodal Emotion Recognition in Conversation
    Chen, Feiyu
    Shao, Jie
    Zhu, Anjie
    Ouyang, Deqiang
    Liu, Xueliang
    Shen, Heng Tao
    [J]. IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (01) : 187 - 198
  • [5] Modeling Civic Engagement: A Student Conversation with Jonathan Kozol
    Thacker, Peter
    Christen, Richard S.
    [J]. EDUCATIONAL FORUM, 2007, 71 (01): : 60 - 70
  • [6] From Analysis to Modeling of Engagement as Sequences of Multimodal Behaviors
    Dermouche, Soumia
    Pelachaud, Catherine
    [J]. PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 786 - 791
  • [7] Multimodal Learning Engagement Assessment System: An Innovative Approach to Optimizing Learning Engagement
    Li, Chuangqi
    Weng, Xinying
    Li, Yifan
    Zhang, Tianxin
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2024,
  • [8] Multimodal Approach to Modeling of Manufacturing Processes
    Pawlewski, Pawel
    [J]. VARIETY MANAGEMENT IN MANUFACTURING: PROCEEDINGS OF THE 47TH CIRP CONFERENCE ON MANUFACTURING SYSTEMS, 2014, 17 : 716 - 720
  • [9] A multimodal approach for face modeling and recognition
    Mahoor, Mohammad H.
    Abdel-Mottaleb, Mohamed
    [J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2008, 3 (03) : 431 - 440
  • [10] A contribution to a multimodal approach to knowledge modeling
    Guadagnin R.
    [J]. Pattern Recognition and Image Analysis, 2014, 24 (3) : 395 - 399