A review of speech-based bimodal recognition

被引:120
|
作者
Chibelushi, CC [1 ]
Deravi, F
Mason, JSD
机构
[1] Staffordshire Univ, Sch Comp, Stafford ST18 0DG, Staffs, England
[2] Univ Kent, Elect Engn Lab, Canterbury CT2 7NT, Kent, England
[3] Univ Coll Swansea, Dept Elect & Elect Engn, Swansea SA2 8PP, W Glam, Wales
关键词
audio-visual fusion; joint media processing; multimodal recognition; speaker recognition; speech recognition;
D O I
10.1109/6046.985551
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Speech recognition and speaker recognition by machine are crucial ingredients for many important applications such as natural and flexible human-machine interfaces. Most developments in speech-based automatic recognition have relied on acoustic speech as the sole input signal, disregarding its visual counterpart. However, recognition based on acoustic speech alone can be afflicted with deficiencies that preclude its use in many real-world applications, particularly under adverse conditions. The combination of auditory and visual modalities promises higher recognition accuracy and robustness than can be obtained with a single modality. Multimodal recognition is therefore acknowledged as a vital component of the next generation of spoken language systems. This paper reviews the components of bimodal recognizers, discusses the accuracy of bimodal recognition, and highlights some outstanding research issues as, well as possible application domains.
引用
收藏
页码:23 / 37
页数:15
相关论文
共 50 条
  • [1] Robust Speech-Based Happiness Recognition
    Lin, Chang-Hong
    Siahaan, Ernestasia
    Chin, Yu-Hau
    Chen, Bo-Wei
    Wang, Jia-Ching
    Wang, Jhing-Fa
    [J]. 1ST INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT 2013), 2013, : 227 - 230
  • [2] Speech-Based Activity Recognition for Trauma Resuscitation
    Abdulbaqi, Jalal
    Gu, Yue
    Xu, Zhichao
    Gao, Chenyang
    Marsic, Ivan
    Burd, Randall S.
    [J]. 2020 8TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI 2020), 2020, : 376 - 383
  • [3] Effect of Reverberation in Speech-based Emotion Recognition
    Zhao, Shujie
    Yang, Yan
    Chen, Jingdong
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON THE SCIENCE OF ELECTRICAL ENGINEERING IN ISRAEL (ICSEE), 2018,
  • [4] An investigation of speech-based human emotion recognition
    Wang, YJ
    Guan, L
    [J]. 2004 IEEE 6TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 2004, : 15 - 18
  • [5] Towards Robust Speech-Based Emotion Recognition
    Tabatabaei, Talieh S.
    Krishnan, Sridhar
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
  • [6] Automatic Bimodal Audiovisual Speech Recognition: A Review
    Kandagal, Amaresh P.
    Udayashankara, V.
    [J]. 2014 INTERNATIONAL CONFERENCE ON CONTEMPORARY COMPUTING AND INFORMATICS (IC3I), 2014, : 940 - 945
  • [7] Speech-based Emotion Recognition and Next Reaction Prediction
    Noroozi, Fatemeh
    Akrami, Neda
    Anbarjafari, Gholamreza
    [J]. 2017 25TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2017,
  • [8] Difficulties in Automatic Speech Recognition of Dysarthric Speakers and Implications for Speech-Based Applications Used by the Elderly: A Literature Review
    Young, Victoria
    Mihailidis, Alex
    [J]. ASSISTIVE TECHNOLOGY, 2010, 22 (02) : 99 - 112
  • [9] ECHO: A speech recognition package for the design of robust interactive speech-based applications
    Kabré H.
    [J]. International Journal of Speech Technology, 1997, 2 (2) : 133 - 143
  • [10] Compensate the Speech Recognition Delays for Accurate Speech-Based Cursor Position Control
    Tong, Qiang
    Wang, Ziyun
    [J]. HUMAN-COMPUTER INTERACTION, PT II, 2009, 5611 : 752 - 760