Deciphering the Silent Participant On the Use of Audio-Visual Cues for the Classification of Listener Categories in Group Discussions

被引：11

作者：

Oertel, Catharine ^{[1
]}

Mora, Kenneth A. Funes ^{[2
]}

Gustafson, Joakim ^{[1
]}

Odobez, Jean-Marc ^{[2
]}

机构：

[1] KTH Royal Inst Technol, Linstedtsvagen 44, Stockholm, Sweden

[2] Ecole Polytech Fed Lausanne, Idiap Res Inst, CH-1015 Lausanne, Switzerland

来源：

ICMI'15: PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION | 2015年

关键词：

listener categories; non-verbal cues; eye-gaze;

D O I：

10.1145/2818346.2820759

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Estimating a silent participant's degree of engagement and his role within a group discussion can be challenging, as there are no speech related cues available at the given time. Having this information available, however, can provide important insights into the dynamics of the group as a whole. In this paper, we study the classification of listeners into several categories (attentive listener, side participant and bystander). We devised a thin-sliced perception test where subjects were asked to assess listener roles and engagement levels in 15-second video-clips taken from a corpus of group interviews. Results show that humans are usually able to assess silent participant roles. Using the annotation to identify from a set of multimodal low-level features, such as past speaking activity, backchannels (both visual and verbal), as well as gaze patterns, we could identify the features which are able to distinguish between different listener categories. Moreover, the results show that many of the audiovisual effects observed on listeners in dyadic interactions, also hold for multi-party interactions. A preliminary classifier achieves an accuracy of 64%.

引用

页码：107 / 114

页数：8

共 3 条

[1] Vehicle Detection and Classification using Audio-Visual cues
Piyush, P.
Rajan, Rajeev
Mary, Leena
Koshy, Bino I.
2016 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND INTEGRATED NETWORKS (SPIN), 2016, : 732 - 736
[2] HAVE-Net: Hallucinated Audio-Visual Embeddings for Few-Shot Classification with Unimodal Cues
Jha, Ankit
Pal, Debabrata
Singha, Mainak
Agarwal, Naman
Banerjee, Biplab
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2023, PT IV, 2025, 2136 : 390 - 398
[3] Brain Connectivity Features-based Age Group Classification using Temporal Asynchrony Audio-Visual Integration Task
Singh, Prerna
Tripathi, Ayush
Kumar, Lalan
Gandhi, Tapan Kumar
2023 45TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE & BIOLOGY SOCIETY, EMBC, 2023,

← 1 →