Multimodal Speaker Identification Based on Text and Speech

被引:0
|
作者
Moschonas, Panagiotis [1 ]
Kotropoulos, Constantine [1 ]
机构
[1] Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece
来源
关键词
multimodal speaker identification; text; speech; probabilistic latent semantic indexing; Mel-frequency cepstral coefficients; nearest neighbor classifier; convex combination;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a novel method for speaker identification based on both speech utterances and their transcribed text. The transcribed text of each speaker's utterance is processed by the probabilistic latent semantic indexing (PLSI) that offers a powerful means to model each speaker's vocabulary employing a number of hidden topics, which are closely related to his/her identity, function, or expertise. Mel-frequency cepstral coefficients (MFCCs) axe extracted from each speech frame and their dynamic range is quantized to a number of predefined bins in order to compute MFCC local histograms for each speech utterance, which is time-aligned with the transcribed text. Two identity scores are independently computed by the PLSI applied to the text and the nearest neighbor classifier applied to the local MFCC histograms. It is demonstrated that a convex combination of the two scores is more accurate than the individual scores on speaker identification experiments conducted on broadcast news of the RT-03 MDE Training Data Text and Annotations corpus distributed by the Linguistic Data Consortium.
引用
收藏
页码:100 / 109
页数:10
相关论文
共 50 条
  • [1] Bayesian networks in multimodal speech recognition and speaker identification
    Nefian, AV
    Liang, LH
    CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 2004 - 2008
  • [2] Multimodal speaker segmentation and identification in presence of overlapped speech segments
    Rozgić V.
    Han K.J.
    Georgiou P.G.
    Narayanan S.
    Journal of Multimedia, 2010, 5 (04): : 322 - 331
  • [3] Effect of speech coding on text-independent speaker identification
    Porwal, G
    Patil, HA
    Basu, TK
    2005 International Conference on Intelligent Sensing and Information Processing, Proceedings, 2005, : 415 - 420
  • [4] Text Dependent Speaker Identification and Speech Recognition Using Artificial Neural Network
    Swamy, Suma
    Shalini, T.
    Nagabhushan, Sindhu P.
    Nawaz, Sumaiah
    Ramakrishnan, K. V.
    GLOBAL TRENDS IN COMPUTING AND COMMUNICATION SYSTEMS, PT 1, 2012, 269 : 160 - +
  • [5] Speaker Identification Based on Physical Variation of Speech Signal
    Nandan, Durgesh
    Singh, Mahesh Kumar
    Kumar, Sanjeev
    Yadav, Harendra Kumar
    TRAITEMENT DU SIGNAL, 2022, 39 (02) : 711 - 716
  • [6] Audiovisual Speaker Identification Based on Lip and Speech Modalities
    Chelali, Fatma
    Djeradi, Amar
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2017, 14 (01) : 99 - 110
  • [7] Text─Dependent Speaker Identification
    CHEN Ke XIE Dahong CHI Huisheng (National Lab of Machine Perception and Center for Information Science
    北京大学学报(自然科学版), 1996, (03) : 128 - 137
  • [8] Speaker identification employing waveform based speech CODEC
    Mikhael, WB
    Premakanthan, P
    2002 45TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL III, CONFERENCE PROCEEDINGS, 2002, : 340 - 343
  • [9] Gammachirp filterbank based speech analysis for speaker identification
    Bouchamekh, Mouslem
    Bousseksou, Boualem
    Berkani, Daoud
    PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, MAN-MACHINE SYSTEMS AND CYBERNETICS (CIMMACS '09), 2009, : 19 - +
  • [10] Speech Enhancement for Multimodal Speaker Diarization System
    Ahmad, Rehan
    Zubair, Syed
    Alquhayz, Hani
    IEEE ACCESS, 2020, 8 : 126671 - 126680