AFFINE INVARIANT FEATURES AND THEIR APPLICATION TO SPEECH RECOGNITION

被引:7
|
作者
Qiao, Yu [1 ]
Suzuki, Masayuki [1 ]
Minematsu, Nobuaki [1 ]
机构
[1] Univ Tokyo, Grad Sch Engn, Tokyo, Japan
关键词
Affine invariant feature; frequency warping; speaker normalization; speech recognition;
D O I
10.1109/ICASSP.2009.4960662
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a set of affine invariant features (AIFs) for sequence data. The proposed AIFs can be calculated directly from the sequence data, and their invariance to affine transformation is proved mathematically through algebraic calculation. We apply the AIFs to speech recognition. Since the vocal tract length (VTL) difference causes to frequency warping which can be approximated well by affine transform on cepstral features [1], the AIFs of cepstral sequence provide robust features for VTL variations. We experimentally examine the invariance of AIFs of speech signals, and apply All's for Japanese isolated word recognition. The experimental results show that the combination of AIFs with MFCC or MFCC+Delta can lead to higher recognition rates than MFCC or MFCC+Delta only. Especially in the mismatched experiments, the combination with AIFs can reduce the error rates about 30% when compared to MFCC or MFCC+Delta only. The AIFs are expected to have other applications than speech recognition, since their invariance is general.
引用
收藏
页码:4629 / 4632
页数:4
相关论文
共 50 条
  • [1] Affine-invariant visual features contain supplementary information to enhance speech recognition
    Gurbuz, S
    Patterson, E
    Tufekci, Z
    Gowdy, JN
    [J]. AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2001, 2091 : 175 - 181
  • [2] Aircraft recognition based on affine invariant features
    Xu, Shugong
    Sang, Nong
    Zhang, Jiansen
    Huang, Zailu
    [J]. Huazhong Ligong Daxue Xuebao/Journal Huazhong (Central China) University of Science and Technology, 1995, 23 (10):
  • [3] Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition
    Gurbuz, S
    Tufekci, Z
    Patterson, E
    Gowdy, JN
    [J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 177 - 180
  • [4] Robust speech recognition by extracting invariant features
    Eskikand, Parvin Zarei
    Seyyedsalehi, Seyyed Ali
    [J]. 4TH INTERNATIONAL CONFERENCE OF COGNITIVE SCIENCE, 2012, 32 : 230 - 237
  • [5] ADVERSARIAL LEARNING OF RAW SPEECH FEATURES FOR DOMAIN INVARIANT SPEECH RECOGNITION
    Tripathi, Aditay
    Mohan, Aanchan
    Anand, Saket
    Singh, Maneesh
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5959 - 5963
  • [6] Speaker-Invariant Features for Automatic Speech Recognition
    Umesh, S.
    Sanand, D. R.
    Praveen, G.
    [J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1738 - 1743
  • [7] Vocal tract length invariant features for automatic speech recognition
    Mertins, A
    Rademacher, J
    [J]. 2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 308 - 312
  • [8] Frequency-warping invariant features for automatic speech recognition
    Mertins, Alfred
    Rademacher, Jan
    [J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5883 - 5886
  • [9] Improved Warping-Invariant Features for Automatic Speech Recognition
    Rademacher, Jan
    Waechter, Matthias
    Mertins, Alfred
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1499 - 1502
  • [10] Traffic signs recognition based on affine invariant Hu's moment features
    Liu, Min
    Mao, Jianxu
    [J]. MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 945 - 949