Use of Spectral Centre of Gravity for Generating Speaker Invariant Features for Automatic Speech Recognition

被引:0
|
作者
Sanand, D. R. [1 ]
Balaji, V. [1 ]
Rani, R. Sandhya [1 ]
Umesh, S. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Kanpur 208016, Uttar Pradesh, India
关键词
Spectral Centre of Gravity; Speaker Normalisation; VTLN; Automatic Speech Recognition;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present an approach to generate speaker invariant features for automatic speech recognition (ASR) using the idea of spectral centre of gravity(CG). This is based on the observation that if two signals are delayed versions of one another, then their CG's also differ by the same amount. We exploit this idea to appropriately shift the mel warped log compressed spectra using the estimated CG to obtain speaker invariant features. The use of such speaker invariant or normalised features helps improve the recognition performance of speaker-independent ASR. We show that our proposed approach is computationally efficient when compared to a commonly used method of normalisation called Vocal Tract Length Normalisation (VTLN). We present normalisation results to show that the performance of our proposed approach is comparable to conventional VTLN and yet has the advantage of computational efficiency.
引用
收藏
页码:2258 / 2261
页数:4
相关论文
共 50 条
  • [1] Speaker-Invariant Features for Automatic Speech Recognition
    Umesh, S.
    Sanand, D. R.
    Praveen, G.
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1738 - 1743
  • [2] On the use of complementary spectral features for speaker recognition
    Hosseinzadeh, Danoush
    Krishnan, Sridhar
    [J]. EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2008, 2008 (1)
  • [3] On the Use of Complementary Spectral Features for Speaker Recognition
    Danoush Hosseinzadeh
    Sridhar Krishnan
    [J]. EURASIP Journal on Advances in Signal Processing, 2008
  • [4] Robust Spectral Features for Automatic Speaker Recognition in Mismatch Condition
    Chougule, Sharada V.
    Chavan, Mahesh S.
    [J]. SECOND INTERNATIONAL SYMPOSIUM ON COMPUTER VISION AND THE INTERNET (VISIONNET'15), 2015, 58 : 272 - 279
  • [5] On the Use of Speaker Information for Automatic Speech Recognition in Speaker-imbalanced Corpora
    Soky, Kak
    Li, Sheng
    Mimura, Masato
    Chu, Chenhui
    Kawahara, Tatsuya
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 433 - 437
  • [6] Vocal tract length invariant features for automatic speech recognition
    Mertins, A
    Rademacher, J
    [J]. 2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 308 - 312
  • [7] Frequency-warping invariant features for automatic speech recognition
    Mertins, Alfred
    Rademacher, Jan
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5883 - 5886
  • [8] Improved Warping-Invariant Features for Automatic Speech Recognition
    Rademacher, Jan
    Waechter, Matthias
    Mertins, Alfred
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1499 - 1502
  • [10] ADAPTING TO THE SPEAKER IN AUTOMATIC SPEECH RECOGNITION
    TALBOT, M
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1987, 27 (04): : 449 - 457