AFFINE INVARIANT FEATURES AND THEIR APPLICATION TO SPEECH RECOGNITION

被引：7

作者：

Qiao, Yu ^{[1
]}

Suzuki, Masayuki ^{[1
]}

Minematsu, Nobuaki ^{[1
]}

机构：

[1] Univ Tokyo, Grad Sch Engn, Tokyo, Japan

来源：

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS | 2009年

关键词：

Affine invariant feature; frequency warping; speaker normalization; speech recognition;

D O I：

10.1109/ICASSP.2009.4960662

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper proposes a set of affine invariant features (AIFs) for sequence data. The proposed AIFs can be calculated directly from the sequence data, and their invariance to affine transformation is proved mathematically through algebraic calculation. We apply the AIFs to speech recognition. Since the vocal tract length (VTL) difference causes to frequency warping which can be approximated well by affine transform on cepstral features [1], the AIFs of cepstral sequence provide robust features for VTL variations. We experimentally examine the invariance of AIFs of speech signals, and apply All's for Japanese isolated word recognition. The experimental results show that the combination of AIFs with MFCC or MFCC+Delta can lead to higher recognition rates than MFCC or MFCC+Delta only. Especially in the mismatched experiments, the combination with AIFs can reduce the error rates about 30% when compared to MFCC or MFCC+Delta only. The AIFs are expected to have other applications than speech recognition, since their invariance is general.

引用

页码：4629 / 4632

页数：4

共 50 条

[1] Affine-invariant visual features contain supplementary information to enhance speech recognition
Gurbuz, S
Patterson, E
Tufekci, Z
Gowdy, JN
[J]. AUDIO- AND VIDEO-BASED BIOMETRIC PERSON AUTHENTICATION, PROCEEDINGS, 2001, 2091 : 175 - 181
[2] Aircraft recognition based on affine invariant features
Xu, Shugong
Sang, Nong
Zhang, Jiansen
Huang, Zailu
[J]. Huazhong Ligong Daxue Xuebao/Journal Huazhong (Central China) University of Science and Technology, 1995, 23 (10):
[3] Application of affine-invariant Fourier descriptors to lipreading for audio-visual speech recognition
Gurbuz, S
Tufekci, Z
Patterson, E
Gowdy, JN
[J]. 2001 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-VI, PROCEEDINGS: VOL I: SPEECH PROCESSING 1; VOL II: SPEECH PROCESSING 2 IND TECHNOL TRACK DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS NEURALNETWORKS FOR SIGNAL PROCESSING; VOL III: IMAGE & MULTIDIMENSIONAL SIGNAL PROCESSING MULTIMEDIA SIGNAL PROCESSING - VOL IV: SIGNAL PROCESSING FOR COMMUNICATIONS; VOL V: SIGNAL PROCESSING EDUCATION SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO & ELECTROACOUSTICS; VOL VI: SIGNAL PROCESSING THEORY & METHODS STUDENT FORUM, 2001, : 177 - 180
[4] Robust speech recognition by extracting invariant features
Eskikand, Parvin Zarei
Seyyedsalehi, Seyyed Ali
[J]. 4TH INTERNATIONAL CONFERENCE OF COGNITIVE SCIENCE, 2012, 32 : 230 - 237
[5] ADVERSARIAL LEARNING OF RAW SPEECH FEATURES FOR DOMAIN INVARIANT SPEECH RECOGNITION
Tripathi, Aditay
Mohan, Aanchan
Anand, Saket
Singh, Maneesh
[J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5959 - 5963
[6] Speaker-Invariant Features for Automatic Speech Recognition
Umesh, S.
Sanand, D. R.
Praveen, G.
[J]. 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2007, : 1738 - 1743
[7] Vocal tract length invariant features for automatic speech recognition
Mertins, A
Rademacher, J
[J]. 2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 308 - 312
[8] Frequency-warping invariant features for automatic speech recognition
Mertins, Alfred
Rademacher, Jan
[J]. 2006 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-13, 2006, : 5883 - 5886
[9] Improved Warping-Invariant Features for Automatic Speech Recognition
Rademacher, Jan
Waechter, Matthias
Mertins, Alfred
[J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1499 - 1502
[10] Traffic signs recognition based on affine invariant Hu's moment features
Liu, Min
Mao, Jianxu
[J]. MECHATRONICS AND INDUSTRIAL INFORMATICS, PTS 1-4, 2013, 321-324 : 945 - 949

← 1 2 3 4 5 →