On-Line Multi-Modal Speaker Diarization

被引:0
|
作者
Noulas, Athanasios K. [1 ]
Krose, Ben J. A. [1 ]
机构
[1] Univ Amsterdam, NL-1098 SJ Amsterdam, Netherlands
关键词
Audio-Visual Fusion; Speaker Diarization; Online; Switching Models;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel framework that utilizes multimodal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress front a simple observation model to a complex multimodal one, as more data becomes available. We present an efficient way to guide the learning procedure of the complex model using the early results achieved with the simple model. We present the results achieved in various real-world situations, including videos coming front webcameras, human computer interaction and video conferences.
引用
收藏
页码:350 / 357
页数:8
相关论文
共 50 条
  • [21] Multi-Modal Non-Line-of-Sight Passive Imaging
    Beckus, Andre
    Tamasan, Alexandru
    Ati, George K.
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (07) : 3372 - 3382
  • [22] Contour line extraction in a multi-modal field with sensor networks
    Liao, PK
    Chang, MK
    Kuo, CCJ
    [J]. GLOBECOM '05: IEEE GLOBAL TELECOMMUNICATIONS CONFERENCE, VOLS 1-6: DISCOVERY PAST AND FUTURE, 2005, : 1309 - 1313
  • [23] MULTI-SPEAKER CONVERSATIONS, CROSS-TALK, AND DIARIZATION FOR SPEAKER RECOGNITION
    Sell, Gregory
    McCree, Alan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5425 - 5429
  • [24] Multi-modal Fusion
    Liu, Huaping
    Hussain, Amir
    Wang, Shuliang
    [J]. INFORMATION SCIENCES, 2018, 432 : 462 - 462
  • [25] Multi-modal mapping
    Yates, Darran
    [J]. NATURE REVIEWS NEUROSCIENCE, 2016, 17 (09) : 536 - 536
  • [26] Multi-modal perception
    Hollier, MP
    Rimell, AN
    Hands, DS
    Voelcker, RM
    [J]. BT TECHNOLOGY JOURNAL, 1999, 17 (01) : 35 - 46
  • [27] Exploring Inter- and Intra-speaker Variability in Multi-modal Task Descriptions
    Schreitter, Stephanie
    Krenn, Brigitte
    [J]. 2014 23RD IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (IEEE RO-MAN), 2014, : 43 - 48
  • [28] Multi-modal mapping
    Darran Yates
    [J]. Nature Reviews Neuroscience, 2016, 17 : 536 - 536
  • [29] Audio-visual Speaker Recognition via Multi-modal Correlated Neural Networks
    Geng, Jiajia
    Liu, Xin
    Cheung, Yiu-ming
    [J]. 2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE WORKSHOPS (WIW 2016), 2016, : 123 - 128
  • [30] Self-Supervised Training of Speaker Encoder With Multi-Modal Diverse Positive Pairs
    Tao, Ruijie
    Lee, Kong Aik
    Das, Rohan Kumar
    Hautamaki, Ville
    Li, Haizhou
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1706 - 1719