HIDDEN MARKOV MODEL DIARISATION WITH SPEAKER LOCATION INFORMATION

被引:3
|
作者
Wong, Jeremy H. M. [1 ]
Xiao, Xiong [1 ]
Gong, Yifan [1 ]
机构
[1] Microsoft Speech & Language Grp, Singapore, Singapore
关键词
Speaker location; sound source localisation; hidden Markov model; diarisation; meeting transcription; DIARIZATION;
D O I
10.1109/ICASSP39728.2021.9413761
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Speaker diarisation methods often rely on speaker embeddings to cluster together the segments of audio that are uttered by the same speaker. When the audio is captured using a microphone array, it is possible to estimate the locations of where the sounds originate from. This location information may be complementary to the speaker embeddings in the diarisation processes. This report proposes to extend the Hidden Markov Model (HMM) clustering method, to enable the use of speaker location information. The HMM observation log-likelihood for the speaker location can take the form of a KL-divergence, when the speaker location is represented as a discrete posterior distribution of the probabilities that the sound originated from each possible location. Experimental results on a Microsoft rich meeting transcription task show that using speaker location information with the proposed HMM modification can yield performance improvements over using speaker embeddings alone.
引用
收藏
页码:7158 / 7162
页数:5
相关论文
共 50 条
  • [1] Redefining the Bayesian Information Criterion for Speaker Diarisation
    Stafylakis, Themos
    Katsouros, Vassilis
    Carayannis, George
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1055 - 1058
  • [2] Adapting Speaker Embeddings for Speaker Diarisation
    Kwon, Youngki
    Jung, Jee-weon
    Heo, Hee-Soo
    Kim, You Jin
    Lee, Bong-Jin
    Chung, Joon Son
    [J]. INTERSPEECH 2021, 2021, : 3101 - 3105
  • [3] Speaker verification using Vector Quantization and Hidden Markov Model
    Ilyas, Mohd Zaizu
    Samad, Salina Abdul
    Hussain, Aini
    Ishak, Khairul Anuar
    [J]. 2007 5TH STUDENT CONFERENCE ON RESEARCH AND DEVELOPMENT, 2007, : 210 - 214
  • [4] CONTENT-AWARE SPEAKER EMBEDDINGS FOR SPEAKER DIARISATION
    Sun, G.
    Liu, D.
    Zhang, C.
    Woodland, P. C.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7168 - 7172
  • [5] Combination of deep speaker embeddings for diarisation
    Sun, Guangzhi
    Zhang, Chao
    Woodland, Philip C.
    [J]. NEURAL NETWORKS, 2021, 141 : 372 - 384
  • [6] A hidden Markov model information retrieval system
    Miller, DRH
    Leek, T
    Schwartz, RM
    [J]. SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, : 214 - 221
  • [7] The cohort-selection and normalized Hidden Markov Model for speaker recognition
    Chen, DW
    Wu, ZH
    [J]. COMPUTER SCIENCE AND TECHNOLOGY IN NEW CENTURY, 2001, : 267 - 270
  • [8] Adaptive Speaker Recognition Based on Hidden Markov Model Parameter Optimization
    Wei, Yangjie
    [J]. IEEE ACCESS, 2020, 8 : 34942 - 34948
  • [9] Adaptation of hidden Markov model for telephone speech recognition and speaker adaptation
    Chien, JT
    Wang, HC
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 1997, 144 (03): : 129 - 135
  • [10] DNN APPROACH TO SPEAKER DIARISATION USING SPEAKER CHANNELS
    Milner, Rosanna
    Hain, Thomas
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 4925 - 4929