Multi-resolution sub-band features and models for HMM-based phonetic modelling

被引:1
|
作者
McCourt, PM [1 ]
Vaseghi, SV [1 ]
Doherty, B [1 ]
机构
[1] Queens Univ Belfast, Sch Elect & Elect Engn, Belfast, Antrim, North Ireland
来源
COMPUTER SPEECH AND LANGUAGE | 2000年 / 14卷 / 03期
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1006/csla.2000.0145
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
HMM acoustic models are typically trained on a single set of cepstral features extracted over the full bandwidth of mel-spaced filterbank energies. In this paper, multi-resolution sub-band transformations of the log energy spectra are introduced based on the conjecture that additional cues for phonetic discrimination may exist in the local spectral correlates not captured by the full-band analysis. In this approach the discriminative contribution from sub-band features is considered to supplement rather than substitute for full-band features. HMMs trained on concatenated multi-resolution cepstral features are investigated, along with models based on linearly combined independent multi-resolution streams, in which the sub-band and full-band streams represent different resolutions of the same signal. For the stream-based models, discriminative training of the linear combination weights to a minimum classification error criteria is also applied. Both the concatenated feature and the independent stream modelling configurations are demonstrated to outperform traditional full-band cepstra for HMM-based acoustic phonetic modelling on the TIMIT database. Experiments on context-independent modelling achieve a best increase on the core test set from an accuracy of 62.3% for full-band models to a 67.5% accuracy for discriminately weighted multi-resolution sub-band modelling. A triphone accuracy of 73.9% achieved on the core test set improves notably on full-band cepstra and compares well with results previously published on this task. (C) 2000 Academic Press.
引用
收藏
页码:241 / 259
页数:19
相关论文
共 50 条
  • [1] Multi-resolution phonetic/segmental features and models for HMM-based speech recognition
    Vaseghi, S
    Harte, N
    Milner, B
    [J]. 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1263 - 1266
  • [2] Discriminative multi-resolution sub-band and segmental phonetic model combination
    McCourt, P
    Harte, N
    Vaseghi, S
    [J]. ELECTRONICS LETTERS, 2000, 36 (03) : 270 - 271
  • [3] Detection of ECG signal based on multi-resolution sub-band filter
    Zhang, Wei
    Wang, Xu
    Ge, Linlin
    Zhang, Zhuo
    [J]. 2005 27TH ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-7, 2005, : 2714 - 2717
  • [4] HMM-based speech synthesis using sub-band basis spectrum model
    Ohtani, Yamato
    Tamura, Masatsune
    Morita, Masahiro
    Kagoshima, Takehiko
    Akamine, Masami
    [J]. 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1438 - 1441
  • [5] Singular Value Decomposition Based Sub-band Decomposition and Multi-resolution (SVD-SBD-MRR) Representation of Digital Colour Images
    Singh, Satish Kumar
    Kumar, Shishir
    [J]. PERTANIKA JOURNAL OF SCIENCE AND TECHNOLOGY, 2011, 19 (02): : 229 - 235
  • [6] Wavelet based robust sub-band features for phoneme recognition
    Farooq, O
    Datta, S
    [J]. IEE PROCEEDINGS-VISION IMAGE AND SIGNAL PROCESSING, 2004, 151 (03): : 187 - 193
  • [7] Bird sound detection based on sub-band features and the perceptron model
    Han, Xue
    Peng, Jianxin
    [J]. APPLIED ACOUSTICS, 2024, 217
  • [8] The performance analysis of chinese speech endpoint detection based on continuous multi sub-band spectral features
    He, SN
    Yu, JB
    [J]. 2002 INTERNATIONAL CONFERENCE ON COMMUNICATIONS, CIRCUITS AND SYSTEMS AND WEST SINO EXPOSITION PROCEEDINGS, VOLS 1-4, 2002, : 997 - 1002
  • [9] HMM-based speech enhancement using sub-word models and noise adaptation
    Kato, Akihiro
    Milner, Ben
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3748 - 3752
  • [10] Multi-resolution curve alignment based on salient features
    Li, Zheng
    Luo, Xiaonan
    Gao, Chengying
    [J]. 18TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2006, : 357 - 360