Cross-Lingual Automatic Speech Recognition Using Tandem Features

被引:24
|
作者
Lal, Partha [1 ]
King, Simon [2 ]
机构
[1] Univ Edinburgh, Sch Informat, Edinburgh EH8 9AB, Midlothian, Scotland
[2] Univ Edinburgh, CSTR, Edinburgh EH8 9AB, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Automatic speech recognition; multilayer perceptrons;
D O I
10.1109/TASL.2013.2277932
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic speech recognition depends on large amounts of transcribed speech recordings in order to estimate the parameters of the acoustic model. Recording such large speech corpora is time-consuming and expensive; as a result, sufficient quantities of data exist only for a handful of languages-there are many more languages for which little or no data exist. Given that there are acoustic similarities between speech in different languages, it may be fruitful to use data from a well-resourced source language to estimate the acoustic models for a recognizer in a poorly-resourced target language. Previous approaches to this task have often involved making assumptions about shared phonetic inventories between the languages. Unfortunately pairs of languages do not generally share a common phonetic inventory. We propose an indirect way of transferring information from a source language acoustic model to a target language acoustic model without having to make any assumptions about the phonetic inventory overlap. To do this, we employ tandem features, in which class-posteriors from a separate classifier are decorrelated and appended to conventional acoustic features. Tandem features have the advantage that the language of the speech data used to train the classifier need not be the same as the target language to be recognized. This is because the class-posteriors are not used directly, so do not have to be over any particular set of classes. We demonstrate the use of tandem features in cross-lingual settings, including training on one or several source languages. We also examine factors which may predict a priori how much relative improvement will be brought about by using such tandem features, for a given source and target pair. In addition to conventional phoneme class-posteriors, we also investigate whether articulatory features (AFs)-a multi-stream, discrete, multi-valued labeling of speech-can be used instead. This is motivated by an assumption that AFs are less language-specific than a phoneme set.
引用
收藏
页码:2506 / 2515
页数:10
相关论文
共 50 条
  • [1] Cross-lingual Automatic Speech Recognition Exploiting Articulatory Features
    Zhan, Qingran
    Motlicek, Petr
    Du, Shixuan
    Shan, Yahui
    Ma, Sifan
    Xie, Xiang
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1912 - 1916
  • [2] Zero-Shot Cross-lingual Aphasia Detection using Automatic Speech Recognition
    Chatzoudis, Gerasimos
    Plitsis, Manos
    Stamouli, Spyridoula
    Dimou, Athanasia-Lida
    Katsamanis, Nassos
    Katsouros, Vassilis
    [J]. INTERSPEECH 2022, 2022, : 2178 - 2182
  • [3] Speech Emotion Recognition with Cross-lingual Databases
    Chiou, Bo-Chang
    Chen, Chia-Ping
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 558 - 561
  • [4] IMPROVING LUXEMBOURGISH SPEECH RECOGNITION WITH CROSS-LINGUAL SPEECH REPRESENTATIONS
    Le Minh Nguyen
    Nayak, Shekhar
    Coler, Matt
    [J]. 2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 792 - 797
  • [5] UNSUPERVISED CROSS-LINGUAL SPEECH EMOTION RECOGNITION USING PSEUDO MULTILABEL
    Li, Fin
    Yan, Nan
    Wang, Lan
    [J]. 2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 366 - 373
  • [6] CLIoS: Cross-lingual Induction of Speech Recognition Grammars
    Perera, Nadine
    Pitz, Michael
    Pinkal, Manfred
    [J]. SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2487 - 2494
  • [7] Unsupervised Cross-lingual Representation Learning for Speech Recognition
    Conneau, Alexis
    Baevski, Alexei
    Collobert, Ronan
    Mohamed, Abdelrahman
    Auli, Michael
    [J]. INTERSPEECH 2021, 2021, : 2426 - 2430
  • [8] Evaluating automatic speech recognition systems as quantitative models of cross-lingual phonetic category perception
    Schatz, Thomas
    Bach, Francis
    Dupoux, Emmanuel
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2018, 143 (05): : EL372 - EL378
  • [9] Cross-lingual Speech Emotion Recognition through Factor Analysis
    Desplanques, Brecht
    Demuynck, Kris
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3648 - 3652
  • [10] CROSS-LINGUAL AND MULTILINGUAL SPEECH EMOTION RECOGNITION ON ENGLISH AND FRENCH
    Neumann, Michael
    Ngoc Thang Vu
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5769 - 5773