Finding the correspondence of audio-visual events caused by multiple movements

被引:0
|
作者
Chen, J. [1 ]
Mukai, T. [1 ]
Takeuchi, Y. [1 ]
Matsumoto, T. [1 ]
Kudo, H. [1 ]
Yamamura, T. [1 ]
Ohnishi, N. [1 ]
机构
[1] Nagoya University, Furo-cho, Chikusha-ku, Nagoya 464-8603, Japan
关键词
Cameras - Correlation methods - Microphones - Speech recognition;
D O I
10.3169/itej.55.1450
中图分类号
学科分类号
摘要
We understand the environment by integrating information obtained by the senses of sight, hearing and touch. To integrate information across different senses, we must find the correspondence of events observed by different senses. This paper presents a general method for relating the audio-visual events of more than one movement (repetitive and non-repetitive movement) observed by one camera and one microphone. The method uses general laws without object-specific knowledge. As corresponding cues, we use Gestalt's grouping laws: simultaneity of the occurrence of the sound and the change in movement, and similarity of repetition between sound and movement. We conducted experiments in the real environment, and obtained satisfactory results showing the effectiveness of the proposed method.
引用
收藏
页码:1450 / 1459
相关论文
共 50 条
  • [41] The Audio-Visual Reader
    不详
    JOURNAL OF EDUCATIONAL RESEARCH, 1955, 48 (07): : 552 - 553
  • [42] Perceptual thresholds of audio-visual spatial coherence for a variety of audio-visual objects
    Stenzel, Hanne
    Jackson, Philip J. B.
    2018 AES INTERNATIONAL CONFERENCE ON AUDIO FOR VIRTUAL AND AUGMENTED REALITY, 2018,
  • [43] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [44] Audio-visual event detection based on mining of semantic audio-visual labels
    Goh, KS
    Miyahara, K
    Radhakrishan, R
    Xiong, ZY
    Divakaran, A
    STORAGE AND RETRIEVAL METHODS AND APPLICATIONS FOR MULTIMEDIA 2004, 2004, 5307 : 292 - 299
  • [45] Transfer of Audio-Visual Temporal Training to Temporal and Spatial Audio-Visual Tasks
    Suerig, Ralf
    Bottari, Davide
    Roeder, Brigitte
    MULTISENSORY RESEARCH, 2018, 31 (06) : 556 - 578
  • [46] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [47] Audio-Visual Causality and Stimulus Reliability Affect Audio-Visual Synchrony Perception
    Li, Shao
    Ding, Qi
    Yuan, Yichen
    Yue, Zhenzhu
    FRONTIERS IN PSYCHOLOGY, 2021, 12
  • [48] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation
    Choi, Jeongsoo
    Park, Se Jin
    Kim, Minsu
    Ro, Yong Man
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 27315 - 27327
  • [49] AUDIO-VISUAL SPEECH SEPARATION USING CROSS-MODAL CORRESPONDENCE LOSS
    Makishima, Naoki
    Ihori, Mana
    Takashima, Akihiko
    Tanaka, Tomohiro
    Orihashi, Shota
    Masumura, Ryo
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6673 - 6677
  • [50] Accumulation and decay of visual capture and the ventriloquism aftereffect caused by brief audio-visual disparities
    Adam K. Bosen
    Justin T. Fleming
    Paul D. Allen
    William E. O‘Neill
    Gary D. Paige
    Experimental Brain Research, 2017, 235 : 585 - 595