Multi-resolution phonetic/segmental features and models for HMM-based speech recognition

被引:0
|
作者
Vaseghi, S
Harte, N
Milner, B
机构
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper explores the modelling of phonetic segments of speech with multi-resolution spectral/time correlates. For spectral representation a set of multi-resolution cepstral features are proposed. Cepstral features obtained from a DCT of the log energy-spectrum over the full voice-bandwidth (100-4000 Hz) are combined with higher resolution features obtained from the DCT of upper subband (say 100-2100) and lower subband (2100-4000) halves. This approach can be extended to several levels of different resolutions. For representation of the temporal structure of speech segments or phonetic units, the conventional cepstral and dynamic cepstral features representing speech at the sub-phonetic levels, are supplemented by a set of phonetic features that describe the trajectory of speech over the duration of a phonetic unit. A conditional probability model for phonetic and sub-phonetic features is considered. Experiments demonstrate that the inclusion of the segmental features result in about 10% decrease in error rates.
引用
收藏
页码:1263 / 1266
页数:4
相关论文
共 50 条
  • [1] Multi-resolution sub-band features and models for HMM-based phonetic modelling
    McCourt, PM
    Vaseghi, SV
    Doherty, B
    [J]. COMPUTER SPEECH AND LANGUAGE, 2000, 14 (03): : 241 - 259
  • [2] Peripheral features for HMM-based speech recognition
    Fukuda, T
    Takigawa, M
    Nitta, T
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 129 - 132
  • [3] Use of voicing features in HMM-based speech recognition
    Thomson, DL
    Chengalvarayan, R
    [J]. SPEECH COMMUNICATION, 2002, 37 (3-4) : 197 - 211
  • [4] PHONETIC SEGMENTATION OF EMOTIONAL SPEECH WITH HMM-BASED METHODS
    Mporas, Iosif
    Ganchev, Todor
    Fakotakis, Nikos
    [J]. INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2010, 24 (07) : 1159 - 1179
  • [5] An HMM-based speech recognition IC
    Han, W
    Hon, KW
    Chan, CF
    Lee, T
    Choy, CS
    Pun, KP
    Ching, PC
    [J]. PROCEEDINGS OF THE 2003 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL II: COMMUNICATIONS-MULTIMEDIA SYSTEMS & APPLICATIONS, 2003, : 744 - 747
  • [6] HMM-based phonetic engine for continuous speech of a regional language
    Kaur, Rupinderdeep
    Sharma, R. K.
    Kumar, Parteek
    [J]. MODERN PHYSICS LETTERS B, 2019, 33 (24):
  • [7] DIRICHLET MIXTURE MODELS OF NEURAL NET POSTERIORS FOR HMM-BASED SPEECH RECOGNITION
    Balakrishnan, V
    Sivaram, G. S. V. S.
    Khudanpur, Sanjeev
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 5028 - 5031
  • [8] Modified Viterbi Scoring for HMM-Based Speech Recognition
    Jo, Jihyuck
    Kim, Han-Gyu
    Park, In-Cheol
    Jung, Bang Chul
    Yoo, Hoyoung
    [J]. INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2019, 25 (02): : 351 - 358
  • [9] Normalized training for HMM-based visual speech recognition
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    Kitamura, Tadashi
    Kobayashi, Takao
    [J]. ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2006, 89 (11): : 40 - 50
  • [10] Simplified scoring methods for HMM-based speech recognition
    Paramonov, Pavel
    Sutula, Nadezhda
    [J]. SOFT COMPUTING, 2016, 20 (09) : 3455 - 3460