Multimodal Speaker Identification Based on Text and Speech

被引：0

作者：

Moschonas, Panagiotis ^{[1
]}

Kotropoulos, Constantine ^{[1
]}

机构：

[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece

来源：

BIOMETRICS AND IDENTITY MANAGEMENT | 2008年 / 5372卷

关键词：

multimodal speaker identification; text; speech; probabilistic latent semantic indexing; Mel-frequency cepstral coefficients; nearest neighbor classifier; convex combination;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker's vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Mel-frequency cepstral coefficients (MFCCs) axe extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, which is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied to the text and the nearest neighbor classifier applied to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experiments conducted on broadcast news of the RT-03 MDE Training Data Text and Annotations corpus distributed by the Linguistic Data Consortium.

引用

页码：100 / 109

页数：10

共 50 条

[21] Robust feature based on speech harmonic structure for speaker identification
College of Communication and Information Engineering, Nanjing Univ. of Posts and Telecom., Nanjing 210003, China
Dianzi Yu Xinxi Xuebao, 2006, 10 (1786-1789):
[22] Spectral Restoration Based Speech Enhancement for Robust Speaker Identification
Saleem, Nasir
Tareen, Tayyaba Gul
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2018, 5 (01): : 34 - 39
[23] Speaker independent speech recognition system based on phoneme identification
Maheswari, N. Uma
Kabilan, A. P.
Venkatesh, R.
ICCN: 2008 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING, 2008, : 585 - +
[24] Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
Ajmera, Pawan K.
Jadhav, Dattatray V.
Holambe, Raghunath S.
PATTERN RECOGNITION, 2011, 44 (10-11) : 2749 - 2759
[25] Multimodal speech synthesis architecture for unsupervised speaker adaptation
Hieu-Thi Luong
Yamagishi, Junichi
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2494 - 2498
[26] Text-independent speaker identification
Gish, Herbert
Schmidt, Michael
IEEE SIGNAL PROCESSING MAGAZINE, 1994, 11 (04) : 18 - 32
[27] Multimodal speaker segmentation in presence of overlapped speech segments
Rozgic, Viktor
Han, Kyu Jeong
Georgiou, Panayiotis G.
Narayanan, Shrikanth
ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 679 - 684
[28] A Novel Multimodal Fusion Technique for Text Based Hate Speech Classification
Shah, Pranav
Patel, Ankit
ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT II, 2022, 1614 : 359 - 369
[29] Improving Speaker Segmentation via Speaker Identification and Text Segmentation
Li, Runxin
Schultz, Tanja
Jin, Qin
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 928 - 931
[30] SPEAKER IDENTIFICATION WITH DISTANT MICROPHONE SPEECH
Jin, Qin
Li, Runxin
Yang, Qian
Laskowski, Kornel
Schultz, Tanja
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4518 - 4521

← 1 2 3 4 5 →