i-Vector Modeling of Speech Attributes for Automatic Foreign Accent Recognition

被引:25
|
作者
Behravan, Hamid [1 ]
Hautamaki, Ville [1 ]
Siniscalchi, Sabato Marco [2 ,3 ]
Kinnunen, Tomi [1 ]
Lee, Chin-Hui [3 ]
机构
[1] Univ Eastern Finland, Sch Comp, Joensuu 80130, Finland
[2] Kore Univ Enna, Dept Comp Engn, I-94100 Enna, Italy
[3] Georgia Inst Technol, Dept Elect & Comp Engn, Atlanta, GA 30332 USA
基金
芬兰科学院;
关键词
Attribute detectors; English corpus; Finnish corpus; i-vector system; LANGUAGE; CLASSIFICATION; VERIFICATION; INFORMATION;
D O I
10.1109/TASLP.2015.2489558
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a unified approach to automatic foreign accent recognition. It takes advantage of recent technology advances in both linguistics and acoustics based modeling techniques in automatic speech recognition (ASR) while overcoming the issue of a lack of a large set of transcribed data often required in designing state-of-the-art ASR systems. The key idea lies in defining a common set of fundamental units "universally" across all spoken accents such that any given spoken utterance can be transcribed with this set of "accent-universal" units. In this study, we adopt a set of units describing manner and place of articulation as speech attributes. These units exist in most spoken languages and they can be reliably modeled and extracted to represent foreign accent cues. We also propose an i-vector representation strategy to model the feature streams formed by concatenating these units. Testing on both the Finnish national foreign language certificate (FSD) corpus and the English NIST 2008 SRE corpus, the experimental results with the proposed approach demonstrate a significant system performance improvement with p-value 0.05 over those with the conventional spectrum-based techniques. We observed up to a 15% relative error reduction over the already very strong i-vector accented recognition system when only manner information is used. Additional improvement is obtained by adding place of articulation clues along with context information. Furthermore, diagnostic information provided by the proposed approach can be useful to the designers to further enhance the system performance.
引用
收藏
页码:29 / 41
页数:13
相关论文
共 50 条
  • [1] Supervised I-vector modeling for language and accent recognition
    Ramoji, Shreyas
    Ganapathy, Sriram
    [J]. COMPUTER SPEECH AND LANGUAGE, 2020, 60
  • [2] Factors affecting i-vector based foreign accent recognition: A case study in spoken Finnish
    Behravan, Hamid
    Hautamaki, Ville
    Kinnunen, Tomi
    [J]. SPEECH COMMUNICATION, 2015, 66 : 118 - 129
  • [3] I-VECTOR ESTIMATION AS AUXILIARY TASK FOR MULTI-TASK LEARNING BASED ACOUSTIC MODELING FOR AUTOMATIC SPEECH RECOGNITION
    Pironkov, Gueorgui
    Dupont, Stephane
    Dutoit, Thierry
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 1 - 7
  • [4] SPEECH EMOTION RECOGNITION WITH I-VECTOR FEATURE AND RNN MODEL
    Zhang, Teng
    Wu, Ji
    [J]. 2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, 2015, : 524 - 528
  • [5] An i-vector GPLDA System for Speech based Emotion Recognition
    Gamage, Kalani Wataraka
    Sethu, Vidhyasaharan
    Phu Ngoc Le
    Ambikairajah, Eliathamby
    [J]. 2015 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA), 2015, : 289 - 292
  • [6] ACCENT RECOGNITION USING I-VECTOR, GAUSSIAN MEAN SUPERVECTOR AND GAUSSIAN POSTERIOR PROBABILITY SUPERVECTOR FOR SPONTANEOUS TELEPHONE SPEECH
    Bahari, Mohamad Hasan
    Saeidi, Rahim
    Van Hamme, Hugo
    Van Leeuwen, David
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7344 - 7348
  • [7] I-Vector Dependent Feature Space Transformations for Adaptive Speech Recognition
    Li, Xiangang
    Wu, Xihong
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3635 - 3639
  • [8] Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation
    Ankit Kumar
    Rajesh Kumar Aggarwal
    [J]. International Journal of Speech Technology, 2022, 25 : 67 - 78
  • [9] Hindi speech recognition using time delay neural network acoustic modeling with i-vector adaptation
    Kumar, Ankit
    Aggarwal, Rajesh Kumar
    [J]. INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2022, 25 (01) : 67 - 78
  • [10] Automatic Recognition of Unified Parkinson's Disease Rating from Speech with Acoustic, i-Vector and Phonotactic Features
    An, Guozhen
    Brizan, David Guy
    Ma, Min
    Morales, Michelle
    Syed, Ali Raza
    Rosenberg, Andrew
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 508 - 512