Finding the correspondence of audio-visual events caused by multiple movements

被引:0
|
作者
Chen, J. [1 ]
Mukai, T. [1 ]
Takeuchi, Y. [1 ]
Matsumoto, T. [1 ]
Kudo, H. [1 ]
Yamamura, T. [1 ]
Ohnishi, N. [1 ]
机构
[1] Nagoya University, Furo-cho, Chikusha-ku, Nagoya 464-8603, Japan
关键词
Cameras - Correlation methods - Microphones - Speech recognition;
D O I
10.3169/itej.55.1450
中图分类号
学科分类号
摘要
We understand the environment by integrating information obtained by the senses of sight, hearing and touch. To integrate information across different senses, we must find the correspondence of events observed by different senses. This paper presents a general method for relating the audio-visual events of more than one movement (repetitive and non-repetitive movement) observed by one camera and one microphone. The method uses general laws without object-specific knowledge. As corresponding cues, we use Gestalt's grouping laws: simultaneity of the occurrence of the sound and the change in movement, and similarity of repetition between sound and movement. We conducted experiments in the real environment, and obtained satisfactory results showing the effectiveness of the proposed method.
引用
收藏
页码:1450 / 1459
相关论文
共 50 条
  • [21] Self-supervised object detection from audio-visual correspondence
    Afouras, Triantafyllos
    Asano, Yuki M.
    Fagan, Francois
    Vedaldi, Andrea
    Metze, Florian
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10565 - 10576
  • [22] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [23] Exploring the role of actions in calibrating audio-visual events in time
    Ikumi, Nara
    Soto-Faraco, Salvador
    PERCEPTION, 2016, 45 : 248 - 248
  • [24] BOWLING GAME EVENTS DETECTION BASED ON AUDIO-VISUAL CLUES
    Lee, Jiann-Shu
    Su, Shang-Cin
    Chang, Hsuan-Ting
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (11): : 4783 - 4795
  • [25] Perceptions of Audio-Visual Impact Events in Younger and Older Adults
    Bak, Katherine
    Chan, George S. W.
    Schutz, Michael
    Campos, Jennifer L.
    MULTISENSORY RESEARCH, 2021, 34 (08) : 839 - 868
  • [26] Audio-visual representation learning for anomaly events detection in crowds
    Gao, Junyu
    Yang, Hao
    Gong, Maoguo
    Li, Xuelong
    NEUROCOMPUTING, 2024, 582
  • [27] AUDIO-VISUAL EDUCATION
    Brickman, William W.
    SCHOOL AND SOCIETY, 1948, 67 (1739): : 320 - 326
  • [28] Audio-Visual Objects
    Kubovy M.
    Schutz M.
    Review of Philosophy and Psychology, 2010, 1 (1) : 41 - 61
  • [29] Audio-Visual Segmentation
    Zhou, Jinxing
    Wang, Jianyuan
    Zhang, Jiayi
    Sun, Weixuan
    Zhang, Jing
    Birchfield, Stan
    Guo, Dan
    Kong, Lingpeng
    Wang, Meng
    Zhong, Yiran
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 386 - 403
  • [30] USING MULTIPLE VISUAL TANDEM STREAMS IN AUDIO-VISUAL SPEECH RECOGNITION
    Topkaya, Ibrahim Saygin
    Erdogan, Hakan
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4988 - 4991