Fusion of Acoustic and Prosodic Features for Speaker Clustering

被引:0
|
作者
Zibert, Janez [1 ]
Mihelic, France [2 ]
机构
[1] Univ Primorska, Primorska Inst Nat Sci & Technol, Muzejski Trg 2, SI-6000 Koper, Slovenia
[2] Univ Ljubljana, Fac Elect Engn, Ljubljana 61000, Slovenia
来源
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work focus on a speaker clustering methods that are used in speaker diarization systems. The purpose of speaker clustering is to associate together segments that belong to the same speakers. It is usually applied in the last stage of the speaker-diarization process. We concentrate on developing of proper representations of speaker segments for clustering and explore different similarity measures for joining speaker segments together. We realize two different competitive systems. The first is a standard approach using a bottom-up agglomerative clustering principle with the Bayesian Information Criterion (BIC) as a merging criterion. In the next approach a fusion speaker clustering system is developed, where the speaker segments are modeled by acoustic and prosody representations. The idea here is to additionally model the speaker prosody characteristics and add it to basic acoustic information estimated from the speaker segments. We construct 10 basic prosody features derived from the energy of the audio signals, the estimated pitch contours, and the recognized voiced and unvoiced regions in speech. In this way we impose higher-level information in the representations of the speaker segments, which leads to improved clustering of the segments in the case of similar speaker acoustic characteristics or poor acoustic conditions.
引用
收藏
页码:210 / +
页数:3
相关论文
共 50 条
  • [31] SELECTION OF ACOUSTIC FEATURES FOR SPEAKER IDENTIFICATION
    SAMBUR, MR
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1975, AS23 (02): : 176 - 182
  • [32] Acoustic and facial features for speaker recognition
    Roach, MJ
    Brand, JD
    Mason, JSD
    [J]. 15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS: IMAGE, SPEECH AND SIGNAL PROCESSING, 2000, : 258 - 261
  • [33] Fusion features for robust speaker identification
    Ben Fredj, Ines
    Zouhir, Youssef
    Ouni, Kais
    [J]. INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2018, 11 (02) : 65 - 72
  • [34] Speaker Clustering Algorithm Based on Feature Fusion
    Zheng, Yan
    Jiang, Yuan-Xiang
    [J]. Dongbei Daxue Xuebao/Journal of Northeastern University, 2021, 42 (07): : 952 - 959
  • [35] Prosodic and Voice Quality Features for Speaker Verification Over Coded Channel
    Polacky, Jozef
    Chmulik, Michal
    Jarina, Roman
    [J]. 2016 39TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2016, : 327 - 330
  • [36] Modeling prosodic features with probabilistic linear discriminant analysis for speaker verification
    Liang, Chunyan
    Yang, Lin
    Zhou, Ruohua
    Yan, Yonghong
    [J]. Shengxue Xuebao/Acta Acustica, 2015, 40 (01): : 28 - 33
  • [37] Prosodic and Acoustic Features of Emotional Speech in Taiwan Mandarin
    Lin, Hsin-Yi
    Fon, Janice
    [J]. PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON SPEECH PROSODY, VOLS I AND II, 2012, : 450 - 453
  • [38] Using Prosodic and Acoustic Features for Chinese Dialects Identification
    Sun, Linjia
    [J]. PROCEEDINGS OF 2020 2ND INTERNATIONAL CONFERENCE ON IMAGE PROCESSING AND MACHINE VISION AND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND MACHINE LEARNING, IPMV 2020, 2020, : 118 - 123
  • [39] SURVEY AND EVALUATION OF ACOUSTIC FEATURES FOR SPEAKER RECOGNITION
    Lawson, A.
    Vabishchevich, P.
    Huggins, M.
    Ardis, P.
    Battles, B.
    Stauffer, A.
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5444 - 5447
  • [40] An Investigation of Speaker Clustering Algorithms in Adverse Acoustic Environments
    Li, Meng-Zhen
    Zhang, Xiao-Lei
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 1462 - 1466