VTLN in the MFCC Domain: Band-Limited versus Local Interpolation

被引:0
|
作者
Variani, Ehsan [1 ]
Schaaf, Thomas [2 ]
机构
[1] Johns Hopkins Univ, CLSP, Baltimore, MD 21218 USA
[2] M Modal, Pittsburgh, PA 94565 USA
关键词
Automatic speech recognition; VTLN; frequency warping; linear transform; TRANSFORMATION; ADAPTATION; SPEECH;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a new easy-to-implement method to compute a Linear Transform (LT) to perform Vocal Tract Length Normalization (VTLN) on truncated Mel Frequency Cepstral Coefficients (MFCCs) normally used in distributed speech recognition. The method is based on a Local Interpolation which is independent of the Mel filter design. Local Interpolation (LILT) VTLN is theoretically and experimentally compared to a global scheme based on band-limited interpolation (BLI-VTLN) and the conventional frequency warping scheme (FFT-VTLN). Investigating the interoperability of these methods shows that the performance of LILT-VTLN is on par with FFT-VTLN and BLI-VTLN. The statistical significance test also shows that there are no significant differences between FFT-VTLN, LILT-VTLN, and BLI-VTLN, even if the models and front ends do not match.
引用
收藏
页码:1280 / +
页数:2
相关论文
共 50 条