Fusion of Acoustic and Prosodic Features for Speaker Clustering

被引:0
|
作者
Zibert, Janez [1 ]
Mihelic, France [2 ]
机构
[1] Univ Primorska, Primorska Inst Nat Sci & Technol, Muzejski Trg 2, SI-6000 Koper, Slovenia
[2] Univ Ljubljana, Fac Elect Engn, Ljubljana 61000, Slovenia
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work focus on a speaker clustering methods that are used in speaker diarization systems. The purpose of speaker clustering is to associate together segments that belong to the same speakers. It is usually applied in the last stage of the speaker-diarization process. We concentrate on developing of proper representations of speaker segments for clustering and explore different similarity measures for joining speaker segments together. We realize two different competitive systems. The first is a standard approach using a bottom-up agglomerative clustering principle with the Bayesian Information Criterion (BIC) as a merging criterion. In the next approach a fusion speaker clustering system is developed, where the speaker segments are modeled by acoustic and prosody representations. The idea here is to additionally model the speaker prosody characteristics and add it to basic acoustic information estimated from the speaker segments. We construct 10 basic prosody features derived from the energy of the audio signals, the estimated pitch contours, and the recognized voiced and unvoiced regions in speech. In this way we impose higher-level information in the representations of the speaker segments, which leads to improved clustering of the segments in the case of similar speaker acoustic characteristics or poor acoustic conditions.
引用
收藏
页码:210 / +
页数:3
相关论文
共 50 条
  • [1] Prosodic and Phonetic Features for Speaker Clustering in Speaker Diarization Systems
    Zibert, Janez
    Mihelic, France
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 1040 - +
  • [2] Improvement of speaker identification by combining prosodic features with acoustic features
    Zheng, R
    Zhang, SW
    Xu, B
    [J]. ADVANCES IN BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2004, 3338 : 569 - 576
  • [3] CONTOUR MODELING OF PROSODIC AND ACOUSTIC FEATURES FOR SPEAKER RECOGNITION
    Kockmann, Marcel
    Burget, Lukas
    [J]. 2008 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY: SLT 2008, PROCEEDINGS, 2008, : 45 - 48
  • [4] iVector Fusion of Prosodic and Cepstral Features for Speaker Verification
    Kockmann, Marcel
    Ferrer, Luciana
    Burget, Lukas
    Cernocky, Jan Honza
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 272 - 275
  • [5] Improvement of speaker recognition by combining residual and prosodic features with acoustic features
    Chen, SH
    Wang, HC
    [J]. 2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 93 - 96
  • [6] Fusion of acoustic and tokenization features for speaker recognition
    Tong, Rong
    Ma, Bin
    Lee, Kong-Aik
    You, Changhuai
    Zhu, Donglai
    Kinnunen, Tomi
    Sun, Hanwu
    Dong, Minghui
    Chng, Eng-Siong
    Li, Haizhou
    [J]. CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 566 - +
  • [7] Prosodic Features for Speaker Verification
    Mary, Leena
    Yegnanarayana, B.
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 917 - 920
  • [8] Age and Gender Classification using Fusion of Acoustic and Prosodic Features
    Meinedo, Hugo
    Trancoso, Isabel
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2822 - 2825
  • [9] Speaker overlap detection with prosodic features for speaker diarisation
    Zelenak, M.
    Hernando, J.
    [J]. IET SIGNAL PROCESSING, 2012, 6 (08) : 798 - 804
  • [10] Robust prosodic features for speaker identification
    Carey, MJ
    Parris, ES
    LloydThomas, H
    Bennett, S
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1800 - 1803