A review of speech-based bimodal recognition

被引:120
|
作者
Chibelushi, CC [1 ]
Deravi, F
Mason, JSD
机构
[1] Staffordshire Univ, Sch Comp, Stafford ST18 0DG, Staffs, England
[2] Univ Kent, Elect Engn Lab, Canterbury CT2 7NT, Kent, England
[3] Univ Coll Swansea, Dept Elect & Elect Engn, Swansea SA2 8PP, W Glam, Wales
关键词
audio-visual fusion; joint media processing; multimodal recognition; speaker recognition; speech recognition;
D O I
10.1109/6046.985551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech recognition and speaker recognition by machine are crucial ingredients for many important applications such as natural and flexible human-machine interfaces. Most developments in speech-based automatic recognition have relied on acoustic speech as the sole input signal, disregarding its visual counterpart. However, recognition based on acoustic speech alone can be afflicted with deficiencies that preclude its use in many real-world applications, particularly under adverse conditions. The combination of auditory and visual modalities promises higher recognition accuracy and robustness than can be obtained with a single modality. Multimodal recognition is therefore acknowledged as a vital component of the next generation of spoken language systems. This paper reviews the components of bimodal recognizers, discusses the accuracy of bimodal recognition, and highlights some outstanding research issues as, well as possible application domains.
引用
收藏
页码:23 / 37
页数:15
相关论文
共 50 条
  • [31] Recognition of bimodal produced speech based on Support Vector Machines
    Galic, Jovan
    Pavlovic, Dragana Sumarac
    Jovicic, Slobodan T.
    Markovic, Branko
    Grozdic, Dorde
    [J]. 2017 25TH TELECOMMUNICATION FORUM (TELFOR), 2017, : 362 - 365
  • [32] Bimodal Recognition of Cognitive Load Based on Speech and Physiological Changes
    Held, Dennis
    Meudt, Sascha
    Schwenker, Friedhelm
    [J]. MULTIMODAL PATTERN RECOGNITION OF SOCIAL SIGNALS IN HUMAN-COMPUTER-INTERACTION, MPRSS 2016, 2017, 10183 : 12 - 23
  • [33] Could speaker, gender or age awareness be beneficial in speech-based emotion recognition?
    Sidorov, Maxim
    Schmitt, Alexander
    Semenkin, Eugene
    Minker, Wolfgang
    [J]. Proceedings of the 10th International Conference on Language Resources and Evaluation, LREC 2016, 2016, : 61 - 68
  • [34] Deep Bispectral Image Analysis for Speech-based Conversational Emotional Climate Recognition
    Alhussein, Ghada
    Alkhodari, Mohanad
    Alfalahi, Hessa
    Alshehhi, Amnaa
    Hadjileontiadis, Leontios J.
    [J]. 17TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2024, 2024, : 576 - 581
  • [35] Could Speaker, Gender or Age Awareness be beneficial in Speech-based Emotion Recognition?
    Sidorov, Maxim
    Schmitt, Alexander
    Semenkin, Eugene
    Minker, Wolfgang
    [J]. LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 61 - 68
  • [36] Feature selection enhancement and feature space visualization for speech-based emotion recognition
    Kanwal S.
    Asghar S.
    Ali H.
    [J]. PeerJ Computer Science, 2022, 8
  • [37] Speech-based recognition of self-reported and observed emotion in a dimensional space
    Truong, Khiet P.
    van Leeuwen, David A.
    de Jong, Franciska M. G.
    [J]. SPEECH COMMUNICATION, 2012, 54 (09) : 1049 - 1063
  • [38] Feature selection enhancement and feature space visualization for speech-based emotion recognition
    Kanwal, Sofia
    Asghar, Sohail
    Ali, Hazrat
    [J]. PEERJ COMPUTER SCIENCE, 2022, 8
  • [39] Speech-based Gesture Generation for Robots and Embodied Agents: A Scoping Review
    Liu, Yu
    Mohammadi, Gelareh
    Song, Yang
    Johal, Wafa
    [J]. PROCEEDINGS OF THE 9TH INTERNATIONAL USER MODELING, ADAPTATION AND PERSONALIZATION HUMAN-AGENT INTERACTION, HAI 2021, 2021, : 31 - 38
  • [40] Bimodal Speech Recognition for Robot Applications
    Sagheer, Alaa
    Aly, Saleh
    Anter, Samar
    [J]. MAN-MACHINE INTERACTIONS 3, 2014, 242 : 87 - 94