Using Phoneme Recognition and Text-dependent Speaker Verification to Improve Speaker Segmentation for Chinese Speech

被引:0
|
作者
Wang, Gang [1 ]
Wu, Xiaojun [1 ]
Zheng, Thomas Fang [1 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol, Ctr Speech & Language Technol,Div Tech Innovat &, Beijing 100084, Peoples R China
关键词
speaker segmentation; phoneme recognition; text-dependent; short utterances; DIARIZATION; MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speaker segmentation is widely used in many tasks such as multi-speaker detection and speaker tracking. The segmentation performance depends on the performance of speaker verification (SV) between two short utterances to a large extent, so the improvement of the SV performance for short utterances would give the segmentation performance a great help. In this paper, a method based on phoneme recognition and text-dependent speaker recognition is proposed. During segmentation, a phoneme sequence is first recognized using a phoneme, recognizer and then text-dependent speaker recognition based on dynamic time warping (DTW) is performed on the same phoneme in two adjacent windows. Experiments over Chinese Corpus Consortium (CCC) MSS database showed that better performance was achieved compared with the BIC method and the GLR method.
引用
收藏
页码:1457 / 1460
页数:4
相关论文
共 50 条
  • [31] Text-dependent Speaker Recognition using Wavelets and Neural Networks
    Chee Peng Lim
    Siew Chan Woo
    [J]. Soft Computing, 2007, 11 : 549 - 556
  • [32] Text-dependent and text-independent speaker recognition of reverberant speech based on CNN
    El-Moneim, Samia Abd
    Sedik, Ahmed
    Nassar, M. A.
    El-Fishawy, Adel S.
    Sharshar, A. M.
    Hassan, Shaimaa E. A.
    Mahmoud, Adel Zaghloul
    Dessouky, Moawd I.
    El-Banby, Ghada M.
    El-Samie, Fathi E. Abd
    El-Rabaie, El-Sayed M.
    Neyazi, Badawi
    Seddeq, H. S.
    Ismail, Nabil A.
    Khalaf, Ashraf A. M.
    Elabyad, G. S. M.
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 993 - 1006
  • [33] Text-dependent speaker recognition using wavelets and neural networks
    Lim, Chee Peng
    Woo, Siew Chan
    [J]. SOFT COMPUTING, 2007, 11 (06) : 549 - 556
  • [34] EXPLORING SEQUENTIAL CHARACTERISTICS IN SPEAKER BOTTLENECK FEATURE FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Chen, Liping
    Zhao, Yong
    Zhang, Shi-Xiong
    Li, Jie
    Ye, Guoli
    Soong, Frank
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5364 - 5368
  • [35] EXPLOITING SEQUENCE INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Motlicek, Petr
    Madikeri, Srikanth
    Ferras, Marc
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5370 - 5374
  • [36] Template-matching for text-dependent speaker verification
    Dey, Subhadeep
    Motlicek, Petr
    Madikeri, Srikanth
    Ferras, Marc
    [J]. SPEECH COMMUNICATION, 2017, 88 : 96 - 105
  • [37] End Point Detection Using Speech-Specific Knowledge for Text-Dependent Speaker Verification
    Ramesh K. Bhukya
    Biswajit Dev Sarma
    S. R. Mahadeva Prasanna
    [J]. Circuits, Systems, and Signal Processing, 2018, 37 : 5507 - 5539
  • [38] Constrained temporal structure for text-dependent speaker verification
    Larcher, Anthony
    Bonastre, Jean-Francois
    Mason, John S. D.
    [J]. DIGITAL SIGNAL PROCESSING, 2013, 23 (06) : 1910 - 1917
  • [39] Text-dependent and text-independent speaker recognition of reverberant speech based on CNN
    Samia Abd El-Moneim
    Ahmed Sedik
    M. A. Nassar
    Adel S. El-Fishawy
    A. M. Sharshar
    Shaimaa E. A. Hassan
    Adel Zaghloul Mahmoud
    Moawd I. Dessouky
    Ghada M. El-Banby
    Fathi E. Abd El-Samie
    El-Sayed M. El-Rabaie
    Badawi Neyazi
    H. S. Seddeq
    Nabil A. Ismail
    Ashraf A. M. Khalaf
    G. S. M. Elabyad
    [J]. International Journal of Speech Technology, 2021, 24 : 993 - 1006
  • [40] MODELLING THE ALTERNATIVE HYPOTHESIS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Larcher, Anthony
    Lee, Kong Aik
    Ma, Bin
    Li, Haizhou
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,