Audio-Visual Speech Cue Combination

被引：14

作者：

Arnold, Derek H. ^{[1
]}

Tear, Morgan ^{[1
]}

Schindel, Ryan ^{[1
]}

Roseboom, Warrick ^{[1
]}

机构：

[1] Univ Queensland, Sch Psychol, St Lucia, Qld, Australia

来源：

PLOS ONE | 2010年 / 5卷 / 04期

基金：

澳大利亚研究理事会;

关键词：

VISUAL-MOTION SIGNALS; INTEGRATION; INFORMATION;

D O I：

10.1371/journal.pone.0010217

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background: Different sources of sensory information can interact, often shaping what we think we have seen or heard. This can enhance the precision of perceptual decisions relative to those made on the basis of a single source of information. From a computational perspective, there are multiple reasons why this might happen, and each predicts a different degree of enhanced precision. Relatively slight improvements can arise when perceptual decisions are made on the basis of multiple independent sensory estimates, as opposed to just one. These improvements can arise as a consequence of probability summation. Greater improvements can occur if two initially independent estimates are summated to form a single integrated code, especially if the summation is weighted in accordance with the variance associated with each independent estimate. This form of combination is often described as a Bayesian maximum likelihood estimate. Still greater improvements are possible if the two sources of information are encoded via a common physiological process. Principal Findings: Here we show that the provision of simultaneous audio and visual speech cues can result in substantial sensitivity improvements, relative to single sensory modality based decisions. The magnitude of the improvements is greater than can be predicted on the basis of either a Bayesian maximum likelihood estimate or a probability summation. Conclusion: Our data suggest that primary estimates of speech content are determined by a physiological process that takes input from both visual and auditory processing, resulting in greater sensitivity than would be possible if initially independent audio and visual estimates were formed and then subsequently combined.

引用

页数：5

共 50 条

[1] An audio-visual distance for audio-visual speech vector quantization
Girin, L
Foucher, E
Feng, G
[J]. 1998 IEEE SECOND WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1998, : 523 - 528
[2] Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech
Alm, Magnus
Behne, Dawn
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2013, 134 (04): : 3001 - 3010
[3] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
[J]. INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[4] Expressive audio-visual speech
Bevacqua, E
Pelachaud, C
[J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2004, 15 (3-4) : 297 - 304
[5] Cue Integration in Categorical Tasks: Insights from Audio-Visual Speech Perception
Bejjanki, Vikranth Rao
Clayards, Meghan
Knill, David C.
Aslin, Richard N.
[J]. PLOS ONE, 2011, 6 (05):
[6] Effects of aging on audio-visual speech integration Effects of aging on audio-visual speech integration
Huyse, Aurelie
Leybaert, Jacqueline
Berthommier, Frederic
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2014, 136 (04): : 1918 - 1931
[7] Audio-Visual Speech Codecs: Rethinking Audio-Visual Speech Enhancement by Re-Synthesis
Yang, Karren
Markovic, Dejan
Krenn, Steven
Agrawal, Vasu
Richard, Alexander
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8217 - 8227
[8] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
Hwang, Jung-Wook
Park, Jeongkyun
Park, Rae-Hong
Park, Hyung-Min
[J]. APPLIED ACOUSTICS, 2023, 211
[9] An audio-visual speech recognition system for testing new audio-visual databases
Pao, Tsang-Long
Liao, Wen-Yuan
[J]. VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
[10] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
Zhang, Zi-Qiang
Zhang, Jie
Zhang, Jian-Shu
Wu, Ming-Hui
Fang, Xin
Dai, Li-Rong
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350

← 1 2 3 4 5 →