On-Line Multi-Modal Speaker Diarization

被引：0

作者：

Noulas, Athanasios K. ^{[1
]}

Krose, Ben J. A. ^{[1
]}

机构：

[1] Univ Amsterdam, NL-1098 SJ Amsterdam, Netherlands

来源：

ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES | 2007年

关键词：

Audio-Visual Fusion; Speaker Diarization; Online; Switching Models;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a novel framework that utilizes multimodal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress front a simple observation model to a complex multimodal one, as more data becomes available. We present an efficient way to guide the learning procedure of the complex model using the early results achieved with the simple model. We present the results achieved in various real-world situations, including videos coming front webcameras, human computer interaction and video conferences.

引用

页码：350 / 357

页数：8

共 50 条

[21] Multi-Modal Non-Line-of-Sight Passive Imaging
Beckus, Andre
Tamasan, Alexandru
Ati, George K.
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (07) : 3372 - 3382
[22] Contour line extraction in a multi-modal field with sensor networks
Liao, PK
Chang, MK
Kuo, CCJ
[J]. GLOBECOM '05: IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-6: DISCOVERY PAST AND FUTURE, 2005, : 1309 - 1313
[23] MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION
Sell, Gregory
McCree, Alan
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5425 - 5429
[24] Multi-modal Fusion
Liu, Huaping
Hussain, Amir
Wang, Shuliang
[J]. INFORMATION SCIENCES, 2018, 432 : 462 - 462
[25] Multi-modal mapping
Yates, Darran
[J]. NATURE REVIEWS NEUROSCIENCE, 2016, 17 (09) : 536 - 536
[26] Multi-modal perception
Hollier, MP
Rimell, AN
Hands, DS
Voelcker, RM
[J]. BT TECHNOLOGY JOURNAL, 1999, 17 (01) : 35 - 46
[27] Exploring Inter- and Intra-speaker Variability in Multi-modal Task Descriptions
Schreitter, Stephanie
Krenn, Brigitte
[J]. 2014 23RD IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN), 2014, : 43 - 48
[28] Multi-modal mapping
Darran Yates
[J]. Nature Reviews Neuroscience, 2016, 17 : 536 - 536
[29] Audio-visual Speaker Recognition via Multi-modal Correlated Neural Networks
Geng, Jiajia
Liu, Xin
Cheung, Yiu-ming
[J]. 2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE WORKSHOPS (WIW 2016), 2016, : 123 - 128
[30] Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs
Tao, Ruijie
Lee, Kong Aik
Das, Rohan Kumar
Hautamaki, Ville
Li, Haizhou
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1706 - 1719

← 1 2 3 4 5 →