Multimodal Speaker Identification Based on Text and Speech

被引:0
|
作者
Moschonas, Panagiotis [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
来源
关键词
multimodal speaker identification; text; speech; probabilistic latent semantic indexing; Mel-frequency cepstral coefficients; nearest neighbor classifier; convex combination;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker's vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Mel-frequency cepstral coefficients (MFCCs) axe extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, which is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied to the text and the nearest neighbor classifier applied to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experiments conducted on broadcast news of the RT-03 MDE Training Data Text and Annotations corpus distributed by the Linguistic Data Consortium.
引用
收藏
页码:100 / 109
页数:10
相关论文
共 50 条
  • [21] Robust feature based on speech harmonic structure for speaker identification
    College of Communication and Information Engineering, Nanjing Univ. of Posts and Telecom., Nanjing 210003, China
    Dianzi Yu Xinxi Xuebao, 2006, 10 (1786-1789):
  • [22] Spectral Restoration Based Speech Enhancement for Robust Speaker Identification
    Saleem, Nasir
    Tareen, Tayyaba Gul
    INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2018, 5 (01): : 34 - 39
  • [23] Speaker independent speech recognition system based on phoneme identification
    Maheswari, N. Uma
    Kabilan, A. P.
    Venkatesh, R.
    ICCN: 2008 INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING, 2008, : 585 - +
  • [24] Text-independent speaker identification using Radon and discrete cosine transforms based features from speech spectrogram
    Ajmera, Pawan K.
    Jadhav, Dattatray V.
    Holambe, Raghunath S.
    PATTERN RECOGNITION, 2011, 44 (10-11) : 2749 - 2759
  • [25] Multimodal speech synthesis architecture for unsupervised speaker adaptation
    Hieu-Thi Luong
    Yamagishi, Junichi
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2494 - 2498
  • [26] Text-independent speaker identification
    Gish, Herbert
    Schmidt, Michael
    IEEE SIGNAL PROCESSING MAGAZINE, 1994, 11 (04) : 18 - 32
  • [27] Multimodal speaker segmentation in presence of overlapped speech segments
    Rozgic, Viktor
    Han, Kyu Jeong
    Georgiou, Panayiotis G.
    Narayanan, Shrikanth
    ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 679 - 684
  • [28] A Novel Multimodal Fusion Technique for Text Based Hate Speech Classification
    Shah, Pranav
    Patel, Ankit
    ADVANCES IN COMPUTING AND DATA SCIENCES (ICACDS 2022), PT II, 2022, 1614 : 359 - 369
  • [29] Improving Speaker Segmentation via Speaker Identification and Text Segmentation
    Li, Runxin
    Schultz, Tanja
    Jin, Qin
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 928 - 931
  • [30] SPEAKER IDENTIFICATION WITH DISTANT MICROPHONE SPEECH
    Jin, Qin
    Li, Runxin
    Yang, Qian
    Laskowski, Kornel
    Schultz, Tanja
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4518 - 4521