Discriminating Native from Non-Native Speech Using Fusion of Visual Cues

被引:3
|
作者
Georgakis, Christos [1 ]
Petridis, Stavros [1 ]
Pantic, Maja [1 ,2 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Twente, EEMCS, Enschede, Netherlands
基金
英国工程与自然科学研究理事会; 欧盟第七框架计划;
关键词
Non-Native Speech; Visual-only Accent Classification; Foreign Accent Detection; Visual Speech Processing;
D O I
10.1145/2647868.2655026
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The task of classifying accent, as belonging to a native language speaker or a foreign language speaker, has been so far addressed by means of the audio modality only. However, features extracted from the visual modality have been successfully used to extend or substitute audio-only approaches developed for speech or language recognition. This paper presents a fully automated approach to discriminating native from non-native speech in English, based exclusively on visual appearance features from speech. Long Short-Term Memory Neural Networks (LSTMs) are employed to model accent-related speech dynamics and yield accent-class predictions. Subject-independent experiments are conducted on speech episodes captured by mobile phones from the challenging MOBIO Database. We establish a text-dependent scenario, using only those recordings in which all subjects read the same paragraph. Our results show that decision-level fusion of networks trained with complementary appearance descriptors consistently leads to performance improvement over single-feature systems, with the highest gain in accuracy reaching 7.3%. The best feature combinations achieve classification accuracy of 75%, rendering the proposed method a useful accent classification tool in cases of missing or noisy audio stream.
引用
收藏
页码:1177 / 1180
页数:4
相关论文
共 50 条
  • [1] Discrimination Between Native and Non-Native Speech Using Visual Features Only
    Georgakis, Christos
    Petridis, Stavros
    Pantic, Maja
    IEEE TRANSACTIONS ON CYBERNETICS, 2016, 46 (12) : 2758 - 2771
  • [2] Syntactic Cues Take Precedence Over Distributional Cues in Native and Non-Native Speech Segmentation
    Tremblay, Annie
    Spinelli, Elsa
    Coughlin, Caitlin E.
    Namjoshi, Jui
    LANGUAGE AND SPEECH, 2018, 61 (04) : 615 - 631
  • [3] The Lombard intelligibility benefit of native and non-native speech for native and non-native listeners
    Marcoux, Katherine
    Cooke, Martin
    Tucker, Benjamin, V
    Ernestus, Mirjam
    SPEECH COMMUNICATION, 2022, 136 : 53 - 62
  • [4] NATIVE AND NON-NATIVE SPEECH PERCEPTION
    Williams, Daniel
    Escudero, Paola
    ACOUSTICS AUSTRALIA, 2014, 42 (02) : 79 - 83
  • [5] VISUAL-ONLY DISCRIMINATION BETWEEN NATIVE AND NON-NATIVE SPEECH
    Georgakis, Christos
    Petridis, Stavros
    Pantic, Maja
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [6] Native and non-native class discrimination using speech rhythm- and auditory-based cues
    Selouani, S. -A.
    Alotaibi, Y.
    Cichocki, W.
    Gharsellaoui, S.
    Kadi, K.
    COMPUTER SPEECH AND LANGUAGE, 2015, 31 (01): : 28 - 48
  • [7] Intelligibility of native and non-native Dutch speech
    van Wijngaarden, SJ
    SPEECH COMMUNICATION, 2001, 35 (1-2) : 103 - 113
  • [8] Native and non-native segmentation of continuous speech
    Hanulikova, Adriana
    Mitterer, Holger
    McQueen, M. James
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 2008, 43 (3-4) : 675 - 675
  • [9] Perceptual Learning for Native and Non-native Speech
    Baese-Berk, Melissa
    CURRENT TOPICS IN LANGUAGE, 2018, 68 : 1 - 29
  • [10] The use of visual cues in the perception of non-native consonant contrasts
    Hazan, V
    Sennema, A
    Faulkner, A
    Ortega-Llebaria, M
    Iba, M
    Chung, H
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 119 (03): : 1740 - 1751