A developmental model of audio-visual attention (MAVA) for bimodal language learning in infants and robots

被引:0
|
作者
Bergoin, Raphael [1 ]
Boucenna, Sofiane [1 ]
D'Urso, Raphael [1 ]
Cohen, David [2 ,3 ]
Pitti, Alexandre [1 ]
机构
[1] CY Cergy Paris Univ, ENSEA, CNRS, ETIS,UMR 8051, Cergy Pontoise, France
[2] Hop La Pitie Salpetriere, AP HP, Serv Psychiat Enfant & Adolescent, Paris, France
[3] Univ Pierre & Marie Curie Paris, Inst Syst Intelligents & Robot, Paris, France
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
VISUAL-ATTENTION; TALKING-FACE; SYNCHRONY; PERCEPTION; SPEECH; OBJECT; EYES;
D O I
10.1038/s41598-024-69245-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
A social individual needs to effectively manage the amount of complex information in his or her environment relative to his or her own purpose to obtain relevant information. This paper presents a neural architecture aiming to reproduce attention mechanisms (alerting/orienting/selecting) that are efficient in humans during audiovisual tasks in robots. We evaluated the system based on its ability to identify relevant sources of information on faces of subjects emitting vowels. We propose a developmental model of audio-visual attention (MAVA) combining Hebbian learning and a competition between saliency maps based on visual movement and audio energy. MAVA effectively combines bottom-up and top-down information to orient the system toward pertinent areas. The system has several advantages, including online and autonomous learning abilities, low computation time and robustness to environmental noise. MAVA outperforms other artificial models for detecting speech sources under various noise conditions.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Learning Bimodal Structure in Audio-Visual Data
    Monaci, Gianluca
    Vandergheynst, Pierre
    Sommer, Friedrich T.
    IEEE TRANSACTIONS ON NEURAL NETWORKS, 2009, 20 (12): : 1898 - 1910
  • [2] An Audio-Visual Attention System for Online Association Learning
    Heckmann, Martin
    Brandl, Holger
    Domont, Xavier
    Bolder, Bram
    Joublin, Frank
    Goerick, Christian
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2127 - 2130
  • [3] SpeechIndexer: A Flexible Software for Audio-Visual Language Learning
    Glavitsch, Ulrike
    Simon, Klaus
    Szakos, Jozsef
    ICEIC 2011/ IRE&PS 2011: INTERNATIONAL CONFERENCE ON EDUCATION, INFORMATICS, AND CYBERNETICS/ INTERNATIONAL SYMPOSIUM ON INTEGRATING RESEARCH, EDUCATION, AND PROBLEM SOLVING, 2011, : 79 - 82
  • [4] Audio-visual modeling for bimodal speech recognition
    Kaynak, MN
    Zhi, Q
    Cheok, AD
    Sengupta, K
    Chung, KC
    2001 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-5: E-SYSTEMS AND E-MAN FOR CYBERNETICS IN CYBERSPACE, 2002, : 181 - 186
  • [5] Bimodal fusion in audio-visual speech recognition
    Zhang, XZ
    Mersereau, RM
    Clements, M
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
  • [6] Audio-visual Generalised Zero-shot Learning with Cross-modal Attention and Language
    Mercea, Otniel-Bogdan
    Riesch, Lukas
    Koepke, A. Sophia
    Akata, Zeynep
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10543 - 10553
  • [7] Does attention influence audio-visual neural interactions during bimodal object recognition?
    Fort, A
    Giard-Steiner, MH
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2002, : 68 - 68
  • [8] Audio-visual speech processing and attention
    Sams, M
    PSYCHOPHYSIOLOGY, 2003, 40 : S5 - S6
  • [9] Support system for making audio-visual material for learning language
    Tobe, Yuichi
    Fujita, Shinichi
    Hosaka, Toshiko
    2006 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY BASED HIGHER EDUCATION AND TRAINING, VOLS 1 AND 2, 2006, : 199 - 202
  • [10] Audio-Visual Salieny Network with Audio Attention Module
    Cheng, Shuaiyang
    Gao, Xing
    Song, Liang
    Xiahou, Jianbing
    PROCEEDINGS OF 2021 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INFORMATION SYSTEMS (ICAIIS '21), 2021,