Finding the correspondence of audio-visual events caused by multiple movements

被引：0

作者：

Chen, J. ^{[1
]}

Mukai, T. ^{[1
]}

Takeuchi, Y. ^{[1
]}

Matsumoto, T. ^{[1
]}

Kudo, H. ^{[1
]}

Yamamura, T. ^{[1
]}

Ohnishi, N. ^{[1
]}

机构：

[1] Nagoya University, Furo-cho, Chikusha-ku, Nagoya 464-8603, Japan

来源：

Kyokai Joho Imeji Zasshi/Journal of the Institute of Image Information and Television Engineers | 2001年 / 55卷 / 11期

关键词：

Cameras - Correlation methods - Microphones - Speech recognition;

D O I：

10.3169/itej.55.1450

中图分类号：

学科分类号：

摘要：

We understand the environment by integrating information obtained by the senses of sight, hearing and touch. To integrate information across different senses, we must find the correspondence of events observed by different senses. This paper presents a general method for relating the audio-visual events of more than one movement (repetitive and non-repetitive movement) observed by one camera and one microphone. The method uses general laws without object-specific knowledge. As corresponding cues, we use Gestalt's grouping laws: simultaneity of the occurrence of the sound and the change in movement, and similarity of repetition between sound and movement. We conducted experiments in the real environment, and obtained satisfactory results showing the effectiveness of the proposed method.

引用

页码：1450 / 1459

共 50 条

[21] Self-supervised object detection from audio-visual correspondence
Afouras, Triantafyllos
Asano, Yuki M.
Fagan, Francois
Vedaldi, Andrea
Metze, Florian
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10565 - 10576
[22] An audio-visual speech recognition with a new mandarin audio-visual database
Liao, Wen-Yuan
Pao, Tsang-Long
Chen, Yu-Te
Chang, Tsun-Wei
INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
[23] Exploring the role of actions in calibrating audio-visual events in time
Ikumi, Nara
Soto-Faraco, Salvador
PERCEPTION, 2016, 45 : 248 - 248
[24] BOWLING GAME EVENTS DETECTION BASED ON AUDIO-VISUAL CLUES
Lee, Jiann-Shu
Su, Shang-Cin
Chang, Hsuan-Ting
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2010, 6 (11): : 4783 - 4795
[25] Perceptions of Audio-Visual Impact Events in Younger and Older Adults
Bak, Katherine
Chan, George S. W.
Schutz, Michael
Campos, Jennifer L.
MULTISENSORY RESEARCH, 2021, 34 (08) : 839 - 868
[26] Audio-visual representation learning for anomaly events detection in crowds
Gao, Junyu
Yang, Hao
Gong, Maoguo
Li, Xuelong
NEUROCOMPUTING, 2024, 582
[27] AUDIO-VISUAL EDUCATION
Brickman, William W.
SCHOOL AND SOCIETY, 1948, 67 (1739): : 320 - 326
[28] Audio-Visual Objects
Kubovy M.
Schutz M.
Review of Philosophy and Psychology, 2010, 1 (1) : 41 - 61
[29] Audio-Visual Segmentation
Zhou, Jinxing
Wang, Jianyuan
Zhang, Jiayi
Sun, Weixuan
Zhang, Jing
Birchfield, Stan
Guo, Dan
Kong, Lingpeng
Wang, Meng
Zhong, Yiran
COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 386 - 403
[30] USING MULTIPLE VISUAL TANDEM STREAMS IN AUDIO-VISUAL SPEECH RECOGNITION
Topkaya, Ibrahim Saygin
Erdogan, Hakan
2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 4988 - 4991

← 1 2 3 4 5 →