A review of speech-based bimodal recognition

被引:120
|
作者
Chibelushi, CC [1 ]
Deravi, F
Mason, JSD
机构
[1] Staffordshire Univ, Sch Comp, Stafford ST18 0DG, Staffs, England
[2] Univ Kent, Elect Engn Lab, Canterbury CT2 7NT, Kent, England
[3] Univ Coll Swansea, Dept Elect & Elect Engn, Swansea SA2 8PP, W Glam, Wales
关键词
audio-visual fusion; joint media processing; multimodal recognition; speaker recognition; speech recognition;
D O I
10.1109/6046.985551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech recognition and speaker recognition by machine are crucial ingredients for many important applications such as natural and flexible human-machine interfaces. Most developments in speech-based automatic recognition have relied on acoustic speech as the sole input signal, disregarding its visual counterpart. However, recognition based on acoustic speech alone can be afflicted with deficiencies that preclude its use in many real-world applications, particularly under adverse conditions. The combination of auditory and visual modalities promises higher recognition accuracy and robustness than can be obtained with a single modality. Multimodal recognition is therefore acknowledged as a vital component of the next generation of spoken language systems. This paper reviews the components of bimodal recognizers, discusses the accuracy of bimodal recognition, and highlights some outstanding research issues as, well as possible application domains.
引用
收藏
页码:23 / 37
页数:15
相关论文
共 50 条
  • [41] Speech-based Emotion Recognition and Speaker Identification: Static vs. Dynamic Mode of Speech Representation
    Sidorov, Maxim
    Minker, Wolfgang
    Semenkin, Eugene S.
    [J]. JOURNAL OF SIBERIAN FEDERAL UNIVERSITY-MATHEMATICS & PHYSICS, 2016, 9 (04): : 518 - 523
  • [42] Speech-Based Home Automation System
    Fytrakis, Emmanouil
    Georgoulas, Ioannis
    Part, Jose
    Zhu, Yuting
    [J]. BRITISH HCI 2015, 2015, : 271 - 272
  • [43] Usability engineering of speech-based services
    Sidhu, CK
    Coyle, G
    [J]. BRITISH TELECOMMUNICATIONS ENGINEERING, 1996, 14 : 337 - 340
  • [44] Test Automation for Speech-Based Applications
    Griebe, Tobias
    Hesenius, Marc
    Gesthuesen, Marc
    Gruhn, Volker
    [J]. NEW TRENDS IN SOFTWARE METHODOLOGIES, TOOLS AND TECHNIQUES, 2016, 286 : 85 - 100
  • [45] Web services and speech-based applications
    Rouillard, Jose
    [J]. International Conference on Pervasive Services, Proceedings, 2006, : 341 - 344
  • [46] An assessment of a speech-based programming environment
    Begel, Andrew
    Graham, Susan L.
    [J]. IEEE SYMPOSIUM ON VISUAL LANGUAGES AND HUMAN-CENTRIC COMPUTING, PROCEEDINGS, 2006, : 116 - +
  • [47] GENERATING AND PROTECTING AGAINST ADVERSARIAL ATTACKS FOR DEEP SPEECH-BASED EMOTION RECOGNITION MODELS
    Ren, Zhao
    Baird, Alice
    Han, Jing
    Zhang, Zixing
    Schuller, Bjoern
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7184 - 7188
  • [48] Voice Orientation Recognition: New Paradigm of Speech-Based Human-Computer Interaction
    Bu, Yiyu
    Guo, Peng
    [J]. INTERNATIONAL JOURNAL OF HUMAN-COMPUTER INTERACTION, 2024, 40 (18) : 5259 - 5278
  • [49] Usability engineering of speech-based services
    Sidhu, Charanjit K.
    Coyle, Gerry
    [J]. British Telecommunications Engineering, 1996, 14 (pt 4): : 337 - 340
  • [50] Floating to Fixed-point Translation with its Application to Speech-based Emotion Recognition
    Kabi, Bibek
    Sahoo, Subhasmita
    Samantaray, Amiya Kumar
    Routray, Aurobinda
    [J]. 2014 FOURTH INTERNATIONAL CONFERENCE OF EMERGING APPLICATIONS OF INFORMATION TECHNOLOGY (EAIT), 2014, : 21 - 26