Probabilistic Latent Speaker Analysis for Large Vocabulary Speech Recognition

被引:0
|
作者
Su, Dan [1 ]
Wu, Xihong [1 ]
Chi, Huisheng [1 ]
机构
[1] Peking Univ, Speech & Hearing Res Ctr, State Key Lab Machine Percept, Beijing 100871, Peoples R China
关键词
speech recognition; PLSA; trajectory folding phenomenon; speaker variation;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Trajectory folding problem is intrinsic for HMM-based speech recognition systems in which each state is modeled by a mixture of Gaussian components. In this paper, a probabilistic latent semantic analysis (PLSA)-based approach is proposed for use in speech recognition systems to alleviate this problem. The basic idea is that different speech trajectories are strongly correlated with speaker variation, and different speakers may have high scores on certain Gaussian components consistently. Thus, PLSA is adopted to perform co-occurrence analysis between Gaussian components and speakers and provide additional source of information to constrain searching path during decoding procedure. Experimental results show that 11.2% an 2.7% relative reduction on word error rate can be achieved on a homogeneous test set and the 2004 863 evaluation set, respectively.
引用
收藏
页码:1889 / 1892
页数:4
相关论文
共 50 条
  • [1] Probabilistic Latent Speaker Training for Large Vocabulary Speech Recognition
    Su, Dan
    Wu, Xihong
    Chi, Huisheng
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 1225 - 1228
  • [2] Large Vocabulary Speech Recognition: Speaker Dependent and Speaker Independent
    Hemakumar, G.
    Punitha, P.
    [J]. INFORMATION SYSTEMS DESIGN AND INTELLIGENT APPLICATIONS, VOL 1, 2015, 339 : 73 - 80
  • [3] Probabilistic Speaker-Class based Acoustic Modeling for Large Vocabulary Continuous Speech Recognition
    Li, Xiangang
    Su, Dan
    Pang, Zaihu
    Wu, Xihong
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1218 - 1221
  • [4] Speaker verification through large vocabulary continuous speech recognition
    Newman, M
    Gillick, L
    Ito, Y
    McAllaster, D
    Peskin, B
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2419 - 2422
  • [5] Speaker selection training for large vocabulary continuous speech recognition
    Huang, C
    Chen, T
    Chang, E
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 609 - 612
  • [6] Experiments in speaker normalisation and adaptation for large vocabulary speech recognition
    Pye, D
    Woodland, PC
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1047 - 1050
  • [7] Speaker clustering and transformation for speaker adaptation in large-vocabulary speech recognition systems
    Padmanabhan, M
    Bahl, LR
    Nahamoo, D
    Picheny, MA
    [J]. 1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 701 - 704
  • [8] Speaker adaptation in the philips system for large vocabulary continuous speech recognition
    Thelen, E
    Aubert, X
    Beyerlein, P
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1035 - 1038
  • [9] ON LARGE-VOCABULARY SPEAKER-INDEPENDENT CONTINUOUS SPEECH RECOGNITION
    LEE, KF
    [J]. SPEECH COMMUNICATION, 1988, 7 (04) : 375 - 379
  • [10] DSP-based large vocabulary speaker-independent speech recognition
    Hirayama, H
    Yoshida, K
    Koga, S
    Hattori, H
    [J]. NEC RESEARCH & DEVELOPMENT, 1996, 37 (04): : 528 - 534