On-Line Multi-Modal Speaker Diarization

被引:0
|
作者
Noulas, Athanasios K. [1 ]
Krose, Ben J. A. [1 ]
机构
[1] Univ Amsterdam, NL-1098 SJ Amsterdam, Netherlands
关键词
Audio-Visual Fusion; Speaker Diarization; Online; Switching Models;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel framework that utilizes multimodal information to achieve speaker diarization. We use dynamic Bayesian networks to achieve on-line results. We progress front a simple observation model to a complex multimodal one, as more data becomes available. We present an efficient way to guide the learning procedure of the complex model using the early results achieved with the simple model. We present the results achieved in various real-world situations, including videos coming front webcameras, human computer interaction and video conferences.
引用
收藏
页码:350 / 357
页数:8
相关论文
共 50 条
  • [1] MSDWILD: MULTI-MODAL SPEAKER DIARIZATION DATASET IN THE WILD
    Liu, Tao
    Fang, Shuai
    Xiang, Xu
    Song, Hongbo
    Lin, Shaoxiong
    Sun, Jiaqi
    Han, Tianyuan
    Chen, Siyuan
    Yao, Binwei
    Liu, Sen
    Wu, Yifei
    Qian, Yanmin
    Yu, Kai
    [J]. INTERSPEECH 2022, 2022, : 1476 - 1480
  • [2] Developing On-Line Speaker Diarization System
    Dimitriadis, Dimitrios
    Fousek, Petr
    [J]. 18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2739 - 2743
  • [3] Multi-modal segmental models for on-line handwriting recognition
    Artières, T
    Marchand, JM
    Dorizzi, B
    [J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS: PATTERN RECOGNITION AND NEURAL NETWORKS, 2000, : 247 - 250
  • [4] A Multi-Modal Learning System for On-Line Surgical Action Segmentation
    De Rossi, Giacomo
    Roin, Serena
    Setti, Francesco
    Muradore, Riccardo
    [J]. 2020 INTERNATIONAL SYMPOSIUM ON MEDICAL ROBOTICS (ISMR), 2020, : 132 - 138
  • [5] Never-ending learning system for on-line speaker diarization
    Markov, Konstantin
    Nakamura, Satoshi
    [J]. 2007 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, VOLS 1 AND 2, 2007, : 699 - 704
  • [6] MULTI-MODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING COMPRESSED-DOMAIN VIDEO FEATURES
    Friedland, Gerald
    Hung, Hayley
    Yeo, Chuohao
    [J]. 2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4069 - +
  • [7] Multimodal Multi-Channel On-Line Speaker Diarization Using Sensor Fusion Through SVM
    Minotto, Vicente Peruffo
    Jung, Claudio Rosito
    Lee, Bowon
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (10) : 1694 - 1705
  • [8] LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION
    Liu, Qinghua
    Huang, Yating
    Hao, Yunzhe
    Xu, Jiaming
    Xu, Bo
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 488 - 495
  • [9] Multi-modal biometrics authentication using on-line signature and voice pitch
    Nakagawa, Takehiko
    Nakanishi, Isao
    Itoh, Yoshio
    Fukui, Yutaka
    [J]. 2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2, 2006, : 363 - +
  • [10] MAAS: Multi-modal Assignation for Active Speaker Detection
    Leon Alcazar, Juan
    Heilbron, Fabian Caba
    Thabet, Ali K.
    Ghanem, Bernard
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 265 - 274