Discriminating Native from Non-Native Speech Using Fusion of Visual Cues

被引:3
|
作者
Georgakis, Christos [1 ]
Petridis, Stavros [1 ]
Pantic, Maja [1 ,2 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Twente, EEMCS, Enschede, Netherlands
基金
英国工程与自然科学研究理事会; 欧盟第七框架计划;
关键词
Non-Native Speech; Visual-only Accent Classification; Foreign Accent Detection; Visual Speech Processing;
D O I
10.1145/2647868.2655026
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The task of classifying accent, as belonging to a native language speaker or a foreign language speaker, has been so far addressed by means of the audio modality only. However, features extracted from the visual modality have been successfully used to extend or substitute audio-only approaches developed for speech or language recognition. This paper presents a fully automated approach to discriminating native from non-native speech in English, based exclusively on visual appearance features from speech. Long Short-Term Memory Neural Networks (LSTMs) are employed to model accent-related speech dynamics and yield accent-class predictions. Subject-independent experiments are conducted on speech episodes captured by mobile phones from the challenging MOBIO Database. We establish a text-dependent scenario, using only those recordings in which all subjects read the same paragraph. Our results show that decision-level fusion of networks trained with complementary appearance descriptors consistently leads to performance improvement over single-feature systems, with the highest gain in accuracy reaching 7.3%. The best feature combinations achieve classification accuracy of 75%, rendering the proposed method a useful accent classification tool in cases of missing or noisy audio stream.
引用
收藏
页码:1177 / 1180
页数:4
相关论文
共 50 条
  • [31] Discovering regularities in non-native speech
    Carson-Berndsen, J
    Gut, U
    Kelly, R
    CORPUS LINGUISTICS AROUND THE WORLD, 2006, (56): : 77 - +
  • [32] Infant selective attention to native and non-native audiovisual speech
    Kelly C. Roth
    Kenna R. H. Clayton
    Greg D. Reynolds
    Scientific Reports, 12
  • [33] Using the Speech Transmission Index for predicting non-native speech intelligibility
    van Wijngaarden, SJ
    Bronkhorst, AW
    Houtgast, T
    Steeneken, HJM
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2004, 115 (03): : 1281 - 1291
  • [34] Phototactic behavior of native Daphnia in the presence of chemical cues from a non-native predator Bythotrephes
    Emily L. Kiehnau
    Lawrence J. Weider
    Oecologia, 2019, 190 : 799 - 809
  • [35] BUILDING CONNECTED DISCOURSE IN NON-NATIVE SPEECH: RE-SPECIFYING NON-NATIVE PROFICIENCY
    Lee, Yo-An
    PRAGMATICS, 2012, 22 (04): : 591 - 614
  • [36] Phototactic behavior of native Daphnia in the presence of chemical cues from a non-native predator Bythotrephes
    Kiehnau, Emily L.
    Weider, Lawrence J.
    OECOLOGIA, 2019, 190 (04) : 799 - 809
  • [37] On Recognition of Non-Native Speech Using Probabilistic Lexical Model
    Razavi, Marzieh
    Doss, Mathew Magimai
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 26 - 30
  • [38] Filled pauses as cues to the complexity of upcoming phrases for native and non-native listeners
    Watanabe, Michiko
    Hirose, Keikichi
    Den, Yasuharu
    Minematsu, Nobuaki
    SPEECH COMMUNICATION, 2008, 50 (02) : 81 - 94
  • [39] Using crowdsourcing to provide prosodic annotations for non-native speech
    Evanini, Keelan
    Zechner, Klaus
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3076 - 3079
  • [40] The effect of native/non-native information on non-native listeners' comprehension
    Hu, Guiling
    Su, Jing
    LANGUAGE AWARENESS, 2015, 24 (03) : 273 - 281