Speaker-Invariant Features for Automatic Speech Recognition

被引:0
|
作者
Umesh, S. [1 ]
Sanand, D. R. [1 ]
Praveen, G. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Kanpur 208016, Uttar Pradesh, India
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we consider the generation of features for automatic speech recognition (ASR) that are robust to speaker-variations. One of the major causes for the degradation in the performance of ASR systems is due to inter-speaker variations. These variations are commonly modeled by a pure scaling relation between spectra of speakers enunciating the same sound. Therefore, current state-of-the art ASR systems overcome this problem of speaker-variability by doing a brute-force search for the optimal scaling parameter. This procedure known as vocal-tract length normalization (VTLN) is computationally intensive. We have recently used Scale-Transform (a variation of Mellin transform) to generate features which are robust to speaker variations without the need to search for the scaling parameter. However, these features have poorer performance due to loss of phase information. In this paper, we propose to use the magnitude of Scale-Transform and a pre-computed "phase"-vector for each phoneme to generate speaker-invariant features. We compare the performance of the proposed features with conventional VTLN on a phoneme recognition task.
引用
收藏
页码:1738 / 1743
页数:6
相关论文
共 50 条
  • [41] SPEAKER-ADAPTABLE CLASSIFICATION PROCEDURE FOR AUTOMATIC SPEECH RECOGNITION
    KATTERFELDT, H
    THON, W
    [J]. NACHRICHTENTECHNISCHE ZEITSCHRIFT, 1974, 27 (06): : 230 - 232
  • [42] DYNAMIC FREQUENCY WARPING FOR SPEAKER ADAPTATION IN AUTOMATIC SPEECH RECOGNITION
    PALIWAL, KK
    AINSWORTH, WA
    [J]. JOURNAL OF PHONETICS, 1985, 13 (02) : 123 - 134
  • [43] Robust Features for Speaker-Independent Speech Recognition Based on a Certain Class of Translation-Invariant Transformations
    Mueller, Florian
    Mertins, Alfred
    [J]. ADVANCES IN NONLINEAR SPEECH PROCESSING, 2010, 5933 : 111 - 119
  • [44] Speaker-invariant Psychological Stress Detection Using Attention-based Network
    Shin, Hyeon-Kyeong
    Han, Hyewon
    Byun, Kyungguen
    Kang, Hong-Goo
    [J]. 2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 308 - 313
  • [45] TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition
    Li, Wenjie
    Zhang, Pengyuan
    Yan, Yonghong
    [J]. ELECTRONICS LETTERS, 2019, 55 (14) : 816 - 818
  • [46] ADAPTIVE BOOSTING FEATURES FOR AUTOMATIC SPEECH RECOGNITION
    Kham Nguyen
    Ng, Tim
    Long Nguyen
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4733 - 4736
  • [47] ADAPTIVE BOOSTING FEATURES FOR AUTOMATIC SPEECH RECOGNITION
    Kham Nguyen
    Ng, Tim
    Long Nguyen
    [J]. 2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 4733 - 4736
  • [48] On the Correlation and Transferability of Features between Automatic Speech Recognition and Speech Emotion Recognition
    Fayek, Haytham M.
    Lech, Margaret
    Cavedon, Lawrence
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3618 - 3622
  • [49] Speaker-Independent Speech Recognition using Visual Features
    Pooventhiran, G.
    Sandeep, A.
    Manthiravalli, K.
    Harish, D.
    Karthika, Renuka D.
    [J]. International Journal of Advanced Computer Science and Applications, 2020, 11 (11): : 616 - 620
  • [50] ADVERSARIAL LEARNING OF RAW SPEECH FEATURES FOR DOMAIN INVARIANT SPEECH RECOGNITION
    Tripathi, Aditay
    Mohan, Aanchan
    Anand, Saket
    Singh, Maneesh
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5959 - 5963