On-Line Multi-Modal Speaker Diarization

被引：0

作者：

Noulas, Athanasios K. ^{[1
]}

Krose, Ben J. A. ^{[1
]}

机构：

[1] Univ Amsterdam, NL-1098 SJ Amsterdam, Netherlands

来源：

ICMI'07: PROCEEDINGS OF THE NINTH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES | 2007年

关键词：

Audio-Visual Fusion; Speaker Diarization; Online; Switching Models;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents a novel framework that utilizes multimodal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress front a simple observation model to a complex multimodal one, as more data becomes available. We present an efficient way to guide the learning procedure of the complex model using the early results achieved with the simple model. We present the results achieved in various real-world situations, including videos coming front webcameras, human computer interaction and video conferences.

引用

页码：350 / 357

页数：8

共 50 条

[31] Audio-visual Speaker Recognition via Multi-modal Correlated Neural Networks
Geng, Jiajia
Liu, Xin
Cheung, Yiu-ming
[J]. 2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE WORKSHOPS (WIW 2016), 2016, : 123 - 128
[32] Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs
Tao, Ruijie
Lee, Kong Aik
Das, Rohan Kumar
Hautamaki, Ville
Li, Haizhou
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1706 - 1719
[33] Hadamard matrix-guided multi-modal hashing for multi-modal retrieval
Yu, Jun
Huang, Wei
Li, Zuhe
Shu, Zhenqiu
Zhu, Liang
[J]. DIGITAL SIGNAL PROCESSING, 2022, 130
[34] Conversational multi-modal browser: An integrated multi-modal browser and dialog manager
Tiwari, A
Hosn, RA
Maes, SH
[J]. 2003 SYMPOSIUM ON APPLICATIONS AND THE INTERNET, PROCEEDINGS, 2003, : 348 - 351
[35] Hierarchical Multi-Modal Prompting Transformer for Multi-Modal Long Document Classification
Liu, Tengfei
Hu, Yongli
Gao, Junbin
Sun, Yanfeng
Yin, Baocai
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (07) : 6376 - 6390
[36] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
Rouvier, Mickael
Bousquet, Pierre-Michel
Favre, Benoit
[J]. 2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
[37] SPEAKER DIARIZATION WITH LSTM
Wang, Quan
Downey, Carlton
Wan, Li
Mansfield, Philip Andrew
Moreno, Ignacio Lopez
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
[38] Multimodal Speaker Diarization
Noulas, Athanasios
Englebienne, Gwenn
Krose, Ben J. A.
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
[39] Multi-modal sequential Monte Carlo for on-line hierarchical graph structure estimation in model-based scene interpretation
Kim, Sungho
Kweon, In So
[J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 251 - +
[40] Trainable Speaker Diarization
Aronowitz, Hagai
[J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024

← 1 2 3 4 5 →