Using Phoneme Recognition and Text-dependent Speaker Verification to Improve Speaker Segmentation for Chinese Speech

被引：0

作者：

Wang, Gang ^{[1
]}

Wu, Xiaojun ^{[1
]}

Zheng, Thomas Fang ^{[1
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Tsinghua Natl Lab Informat Sci & Technol, Ctr Speech & Language Technol,Div Tech Innovat &, Beijing 100084, Peoples R China

来源：

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 | 2010年

关键词：

speaker segmentation; phoneme recognition; text-dependent; short utterances; DIARIZATION; MODELS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Speaker segmentation is widely used in many tasks such as multi-speaker detection and speaker tracking. The segmentation performance depends on the performance of speaker verification (SV) between two short utterances to a large extent, so the improvement of the SV performance for short utterances would give the segmentation performance a great help. In this paper, a method based on phoneme recognition and text-dependent speaker recognition is proposed. During segmentation, a phoneme sequence is first recognized using a phoneme, recognizer and then text-dependent speaker recognition based on dynamic time warping (DTW) is performed on the same phoneme in two adjacent windows. Experiments over Chinese Corpus Consortium (CCC) MSS database showed that better performance was achieved compared with the BIC method and the GLR method.

引用

页码：1457 / 1460

页数：4

共 50 条

[31] Text-dependent Speaker Recognition using Wavelets and Neural Networks
Chee Peng Lim
Siew Chan Woo
[J]. Soft Computing, 2007, 11 : 549 - 556
[32] Text-dependent and text-independent speaker recognition of reverberant speech based on CNN
El-Moneim, Samia Abd
Sedik, Ahmed
Nassar, M. A.
El-Fishawy, Adel S.
Sharshar, A. M.
Hassan, Shaimaa E. A.
Mahmoud, Adel Zaghloul
Dessouky, Moawd I.
El-Banby, Ghada M.
El-Samie, Fathi E. Abd
El-Rabaie, El-Sayed M.
Neyazi, Badawi
Seddeq, H. S.
Ismail, Nabil A.
Khalaf, Ashraf A. M.
Elabyad, G. S. M.
[J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2021, 24 (04) : 993 - 1006
[33] Text-dependent speaker recognition using wavelets and neural networks
Lim, Chee Peng
Woo, Siew Chan
[J]. SOFT COMPUTING, 2007, 11 (06) : 549 - 556
[34] EXPLORING SEQUENTIAL CHARACTERISTICS IN SPEAKER BOTTLENECK FEATURE FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Chen, Liping
Zhao, Yong
Zhang, Shi-Xiong
Li, Jie
Ye, Guoli
Soong, Frank
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5364 - 5368
[35] EXPLOITING SEQUENCE INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Dey, Subhadeep
Motlicek, Petr
Madikeri, Srikanth
Ferras, Marc
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5370 - 5374
[36] Template-matching for text-dependent speaker verification
Dey, Subhadeep
Motlicek, Petr
Madikeri, Srikanth
Ferras, Marc
[J]. SPEECH COMMUNICATION, 2017, 88 : 96 - 105
[37] End Point Detection Using Speech-Specific Knowledge for Text-Dependent Speaker Verification
Ramesh K. Bhukya
Biswajit Dev Sarma
S. R. Mahadeva Prasanna
[J]. Circuits, Systems, and Signal Processing, 2018, 37 : 5507 - 5539
[38] Constrained temporal structure for text-dependent speaker verification
Larcher, Anthony
Bonastre, Jean-Francois
Mason, John S. D.
[J]. DIGITAL SIGNAL PROCESSING, 2013, 23 (06) : 1910 - 1917
[39] Text-dependent and text-independent speaker recognition of reverberant speech based on CNN
Samia Abd El-Moneim
Ahmed Sedik
M. A. Nassar
Adel S. El-Fishawy
A. M. Sharshar
Shaimaa E. A. Hassan
Adel Zaghloul Mahmoud
Moawd I. Dessouky
Ghada M. El-Banby
Fathi E. Abd El-Samie
El-Sayed M. El-Rabaie
Badawi Neyazi
H. S. Seddeq
Nabil A. Ismail
Ashraf A. M. Khalaf
G. S. M. Elabyad
[J]. International Journal of Speech Technology, 2021, 24 : 993 - 1006
[40] MODELLING THE ALTERNATIVE HYPOTHESIS FOR TEXT-DEPENDENT SPEAKER VERIFICATION
Larcher, Anthony
Lee, Kong Aik
Ma, Bin
Li, Haizhou
[J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,

← 1 2 3 4 5 →