Automatic Speechreading with Applications to Human-Computer Interfaces

被引:0
|
作者
Xiaozheng Zhang
Charles C. Broun
Russell M. Mersereau
Mark A. Clements
机构
[1] Georgia Institute of Technology,Center for Signal and Image Processing
[2] Motorola Human Interface Lab,undefined
关键词
automatic speechreading; visual feature extraction; Markov random fields; hidden Markov models; polynomial classifier; speech recognition; speaker verification;
D O I
暂无
中图分类号
学科分类号
摘要
There has been growing interest in introducing speech as a new modality into the human-computer interface (HCI). Motivated by the multimodal nature of speech, the visual component is considered to yield information that is not always present in the acoustic signal and enables improved system performance over acoustic-only methods, especially in noisy environments. In this paper, we investigate the usefulness of visual speech information in HCI related applications. We first introduce a new algorithm for automatically locating the mouth region by using color and motion information and segmenting the lip region by making use of both color and edge information based on Markov random fields. We then derive a relevant set of visual speech parameters and incorporate them into a recognition engine. We present various visual feature performance comparisons to explore their impact on the recognition accuracy, including the lip inner contour and the visibility of the tongue and teeth. By using a common visual feature set, we demonstrate two applications that exploit speechreading in a joint audio-visual speech signal processing task: speech recognition and speaker verification. The experimental results based on two databases demonstrate that the visual information is highly effective for improving recognition performance over a variety of acoustic noise levels.
引用
收藏
相关论文
共 50 条
  • [1] Automatic Speechreading with applications to human-computer interfaces
    Zhang, XZ
    Broun, CC
    Mersereau, RM
    Clements, MA
    EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1228 - 1247
  • [2] New Applications of Multimodal Human-Computer Interfaces
    Czyzewski, Andrzej
    2012 JOINT CONFERENCE NEW TRENDS IN AUDIO & VIDEO AND SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, & APPLICATIONS (NTAV-SPA 2012), 2012, : 19 - 24
  • [3] Automatic Generation of Human-Computer Interfaces from BACnet Descriptions
    Henschen, Lawrence
    Lee, Julia
    Guthmann, Ries
    DISTRIBUTED, AMBIENT AND PERVASIVE INTERACTIONS: UNDERSTANDING HUMANS, DAPI 2018, PT I, 2018, 10921 : 71 - 84
  • [4] Multimodal human-computer interfaces
    Dutoit, Thierry
    Nigay, Laurence
    Schnaider, Michael
    SIGNAL PROCESSING, 2006, 86 (12) : 3515 - 3517
  • [5] Cross-domain applications of multimodal human-computer interfaces
    Czyzewski, Andrzej
    SPA 2015 SIGNAL PROCESSING ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS, 2015, : 11 - 11
  • [6] DESIGN METHODS FOR HUMAN-COMPUTER INTERFACES
    VANDERVEER, GC
    INFORMATION PROCESSING '94, VOL II: APPLICATIONS AND IMPACTS, 1994, 52 : 188 - 195
  • [7] Multimodal Interfaces of Human-Computer Interaction
    Karpov, A. A.
    Yusupov, R. M.
    HERALD OF THE RUSSIAN ACADEMY OF SCIENCES, 2018, 88 (01) : 67 - 74
  • [8] Patterns for safer human-computer interfaces
    Hussey, A
    COMPUTER SAFETY, RELIABILITY AND SECURITY, 1999, 1698 : 103 - 112
  • [9] ASSESSING THE USABILITY OF HUMAN-COMPUTER INTERFACES
    LINDQUIST, TE
    IEEE SOFTWARE, 1985, 2 (01) : 74 - 82
  • [10] HUMAN-COMPUTER INTERFACES - MODELING AND EVALUATION
    LEE, CH
    PAZ, NM
    COMPUTERS & INDUSTRIAL ENGINEERING, 1991, 21 (1-4) : 577 - 581