Emergent leaders through looking and speaking: from audio-visual data to multimodal recognition

被引:0
|
作者
Dairazalia Sanchez-Cortes
Oya Aran
Dinesh Babu Jayagopi
Marianne Schmid Mast
Daniel Gatica-Perez
机构
[1] Centre du Parc,Idiap Research Institute
[2] Ecole Polytechnique Fédérale de Lausanne (EPFL),Institut de Psychologie du Travail et des Organisations
[3] University of Neuchatel,undefined
来源
关键词
Emergent leadership; Nonverbal behavior; Multimodal cues; Small group interactions;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper we present a multimodal analysis of emergent leadership in small groups using audio-visual features and discuss our experience in designing and collecting a data corpus for this purpose. The ELEA Audio-Visual Synchronized corpus (ELEA AVS) was collected using a light portable setup and contains recordings of small group meetings. The participants in each group performed the winter survival task and filled in questionnaires related to personality and several social concepts such as leadership and dominance. In addition, the corpus includes annotations on participants’ performance in the survival task, and also annotations of social concepts from external viewers. Based on this corpus, we present the feasibility of predicting the emergent leader in small groups using automatically extracted audio and visual features, based on speaking turns and visual attention, and we focus specifically on multimodal features that make use of the looking at participants while speaking and looking at while not speaking measures. Our findings indicate that emergent leadership is related, but not equivalent, to dominance, and while multimodal features bring a moderate degree of effectiveness in inferring the leader, much simpler features extracted from the audio channel are found to give better performance.
引用
收藏
页码:39 / 53
页数:14
相关论文
共 50 条
  • [1] Emergent leaders through looking and speaking: from audio-visual data to multimodal recognition
    Sanchez-Cortes, Dairazalia
    Aran, Oya
    Jayagopi, Dinesh Babu
    Mast, Marianne Schmid
    Gatica-Perez, Daniel
    [J]. JOURNAL ON MULTIMODAL USER INTERFACES, 2013, 7 (1-2) : 39 - 53
  • [2] Multimodal Learning Using 3D Audio-Visual Data or Audio-Visual Speech Recognition
    Su, Rongfeng
    Wang, Lan
    Liu, Xunying
    [J]. 2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 40 - 43
  • [3] Restoring Speaking Lips from Occlusion for Audio-Visual Speech Recognition
    Wang, Jiadong
    Pan, Zexu
    Zhang, Malu
    Tan, Robby T.
    Li, Haizhou
    [J]. THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19144 - 19152
  • [4] DEEP MULTIMODAL LEARNING FOR AUDIO-VISUAL SPEECH RECOGNITION
    Mroueh, Youssef
    Marcheret, Etienne
    Goel, Vaibhava
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 2130 - 2134
  • [5] An audio-visual corpus for multimodal automatic speech recognition
    Andrzej Czyzewski
    Bozena Kostek
    Piotr Bratoszewski
    Jozef Kotus
    Marcin Szykulski
    [J]. Journal of Intelligent Information Systems, 2017, 49 : 167 - 192
  • [6] An audio-visual corpus for multimodal automatic speech recognition
    Czyzewski, Andrzej
    Kostek, Bozena
    Bratoszewski, Piotr
    Kotus, Jozef
    Szykulski, Marcin
    [J]. JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (02) : 167 - 192
  • [7] Data Augmentation for Audio-Visual Emotion Recognition with an Efficient Multimodal Conditional GAN
    Ma, Fei
    Li, Yang
    Ni, Shiguang
    Huang, Shao-Lun
    Zhang, Lin
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (01):
  • [8] Multimodal and Temporal Perception of Audio-visual Cues for Emotion Recognition
    Ghaleb, Esam
    Popa, Mirela
    Asteriadis, Stylianos
    [J]. 2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2019,
  • [9] Multimodal Attentive Fusion Network for audio-visual event recognition
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    [J]. INFORMATION FUSION, 2022, 85 : 52 - 59
  • [10] Multimodal Emotion Recognition using Physiological and Audio-Visual Features
    Matsuda, Yuki
    Fedotov, Dmitrii
    Takahashi, Yuta
    Arakawa, Yutaka
    Yasumo, Keiichi
    Minker, Wolfgang
    [J]. PROCEEDINGS OF THE 2018 ACM INTERNATIONAL JOINT CONFERENCE ON PERVASIVE AND UBIQUITOUS COMPUTING AND PROCEEDINGS OF THE 2018 ACM INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS (UBICOMP/ISWC'18 ADJUNCT), 2018, : 946 - 951