Speaker-Invariant Features for Automatic Speech Recognition

被引:0
|
作者
Umesh, S. [1 ]
Sanand, D. R. [1 ]
Praveen, G. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Kanpur 208016, Uttar Pradesh, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider the generation of features for automatic speech recognition (ASR) that are robust to speaker-variations. One of the major causes for the degradation in the performance of ASR systems is due to inter-speaker variations. These variations are commonly modeled by a pure scaling relation between spectra of speakers enunciating the same sound. Therefore, current state-of-the art ASR systems overcome this problem of speaker-variability by doing a brute-force search for the optimal scaling parameter. This procedure known as vocal-tract length normalization (VTLN) is computationally intensive. We have recently used Scale-Transform (a variation of Mellin transform) to generate features which are robust to speaker variations without the need to search for the scaling parameter. However, these features have poorer performance due to loss of phase information. In this paper, we propose to use the magnitude of Scale-Transform and a pre-computed "phase"-vector for each phoneme to generate speaker-invariant features. We compare the performance of the proposed features with conventional VTLN on a phoneme recognition task.
引用
收藏
页码:1738 / 1743
页数:6
相关论文
共 50 条
  • [1] Automatic Recognition of Connected Vowels Only Using Speaker-invariant Representation of Speech Dynamics
    Asakawa, Satoshi
    Minematsu, Nobuaki
    Hirose, Keikichi
    [J]. INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2352 - +
  • [2] Speaker-invariant suprasegmental temporal features in normal and disguised speech
    Leemann, Adrian
    Kolly, Marie-Jose
    [J]. SPEECH COMMUNICATION, 2015, 75 : 97 - 122
  • [3] Use of Spectral Centre of Gravity for Generating Speaker Invariant Features for Automatic Speech Recognition
    Sanand, D. R.
    Balaji, V.
    Rani, R. Sandhya
    Umesh, S.
    [J]. INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 2258 - 2261
  • [4] Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning
    Wu, Long
    Chen, Hangting
    Wang, Li
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. INTERSPEECH 2019, 2019, : 431 - 435
  • [5] Vocal tract length invariant features for automatic speech recognition
    Mertins, A
    Rademacher, J
    [J]. 2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 308 - 312
  • [6] Frequency-warping invariant features for automatic speech recognition
    Mertins, Alfred
    Rademacher, Jan
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5883 - 5886
  • [7] Improved Warping-Invariant Features for Automatic Speech Recognition
    Rademacher, Jan
    Waechter, Matthias
    Mertins, Alfred
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1499 - 1502
  • [9] SPEAKER-INVARIANT TRAINING VIA ADVERSARIAL LEARNING
    Meng, Zhong
    Li, Jinyu
    Chen, Zhuo
    Zhao, Yong
    Mazalov, Vadim
    Gong, Yifan
    Juang, Biing-Hwang
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5969 - 5973
  • [10] ADAPTING TO THE SPEAKER IN AUTOMATIC SPEECH RECOGNITION
    TALBOT, M
    [J]. INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1987, 27 (04): : 449 - 457