Bayesian networks in multimodal speech recognition and speaker identification

被引:0
|
作者
Nefian, AV [1 ]
Liang, LH [1 ]
机构
[1] Intel Corp, Santa Clara, CA 95051 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bayesian networks are statistical models that extend the framework of hidden Markov models (HMM) and allow for the analysis of multi modal signals such as audio-visual speech. Our recent results demonstrate the use of coupled HMM in audio-visual speech recognition and speaker identification. The increased performance of this model is due to its low complexity and its ability to describe both the audio-visual state asynchrony and natural dependency over time. The audio-visual speaker identification accuracy is enhanced in a late decision approach that integrates the audio-visual speech likelihood and the face likelihood computed using an embedded Bayesian network.
引用
收藏
页码:2004 / 2008
页数:5
相关论文
共 50 条
  • [1] Multimodal Speaker Identification Based on Text and Speech
    Moschonas, Panagiotis
    Kotropoulos, Constantine
    [J]. BIOMETRICS AND IDENTITY MANAGEMENT, 2008, 5372 : 100 - 109
  • [2] SPEAKER IDENTIFICATION AND MESSAGE IDENTIFICATION IN SPEECH RECOGNITION
    GARVIN, PL
    LADEFOGED, P
    [J]. PHONETICA, 1963, 9 (04) : 193 - 199
  • [3] Continuous Speech Recognition and Identification of the Speaker System
    Guffanti, Diego
    Martinez, Danilo
    Paladines, Jose
    Sarmiento, Andrea
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY & SYSTEMS (ICITS 2018), 2018, 721 : 767 - 776
  • [4] RobinNet: A Multimodal Speech Emotion Recognition System With Speaker Recognition for Social Interactions
    Khurana, Yash
    Gupta, Swamita
    Sathyaraj, R.
    Raja, S. P.
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2022, 11 (01) : 478 - 487
  • [5] Speech recognition with dynamic Bayesian networks
    Zweig, G
    Russell, S
    [J]. FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, 1998, : 173 - 180
  • [6] Boosted learning in dynamic Bayesian networks for Multimodal speaker detection
    Garg, A
    Pavlovic, V
    Rehg, JM
    [J]. PROCEEDINGS OF THE IEEE, 2003, 91 (09) : 1355 - 1369
  • [7] Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
    Kanda, Naoyuki
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Zhou, Tianyan
    Yoshioka, Takuya
    [J]. INTERSPEECH 2020, 2020, : 36 - 40
  • [8] Speaker identification and speech recognition using phased arrays
    Xu, Roger
    Mei, Gang
    Ren, ZuBing
    Kwan, Chiman
    Aube, Julien
    Rochet, Cedrick
    Stanford, Vincent
    [J]. AMBIENT INTELLIGENCE IN EVERDAY LIFE, 2006, 3864 : 227 - 238
  • [9] Search in speech, language identification and speaker recognition in Speech@FIT
    Cernocky, Jan
    Burget, Lukas
    Schwarz, Petr
    Matejka, Pavel
    Karafiat, Martin
    Glembek, Ondrej
    Kopecky, Jiri
    Szoeke, Igor
    Fapso, Michal
    Grezl, Frantisek
    Hubeika, Valiantsina
    Oparin, Ilya
    [J]. 2007 17TH INTERNATIONAL CONFERENCE RADIOELEKTRONIKA, VOLS 1 AND 2, 2007, : 132 - +
  • [10] Correlation Networks for Speaker Normalization in Automatic Speech Recognition
    Sharon, Rini A.
    Kothinti, Sandeep Reddy
    Umesh, Srinivasan
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 882 - 886