A Multimodal Connectionist Architecture for Unsupervised Grounding of Spatial Language

被引:10
|
作者
Vavrecka, Michal [1 ]
Farkas, Igor [2 ]
机构
[1] Czech Tech Univ, Dept Cybernet, CR-16635 Prague, Czech Republic
[2] Comenius Univ, Dept Appl Informat, Bratislava 84248, Slovakia
关键词
Unsupervised learning; Self-organizing map; Symbol grounding; Spatial phrases; Multimodal representations; SELF-ORGANIZING NETWORK; WORDS; MODEL;
D O I
10.1007/s12559-013-9212-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a bio-inspired unsupervised connectionist architecture and apply it to grounding the spatial phrases. The two-layer architecture combines by concatenation the information from the visual and the phonological inputs. In the first layer, the visual pathway employs separate 'what' and 'where' subsystems that represent the identity and spatial relations of two objects in 2D space, respectively. The bitmap images are presented to an artificial retina and the phonologically encoded five-word sentences describing the image serve as the phonological input. The visual scene is hence represented by several self-organizing maps (SOMs) and the phonological description is processed by the Recursive SOM that learns to topographically represent the spatial phrases, represented as five-word sentences (e.g., 'blue ball above red cup'). Primary representations from the first-layer modules are unambiguously integrated in a multimodal second-layer module, implemented by the SOM or the 'neural gas' algorithms. The system learns to bind proper lexical and visual features without any prior knowledge. The simulations reveal that separate processing and representation of the spatial location and the object shape significantly improve the performance of the model. We provide quantitative experimental results comparing three models in terms of their accuracy.
引用
收藏
页码:101 / 112
页数:12
相关论文
共 50 条
  • [1] A Multimodal Connectionist Architecture for Unsupervised Grounding of Spatial Language
    Michal Vavrečka
    Igor Farkaš
    [J]. Cognitive Computation, 2014, 6 : 101 - 112
  • [2] Grounding language in perception: A connectionist model of spatial terms and vague quantifiers
    Cangelosi, A
    Coventry, KR
    Rajapakse, R
    Joyce, D
    Bacon, A
    Richards, L
    Newstead, SN
    [J]. MODELING LANGUAGE, COGNITION AND ACTION, 2005, 16 : 47 - 56
  • [3] Strong systematicity through sensorimotor conceptual grounding: an unsupervised, developmental approach to connectionist sentence processing
    Jansen, Peter A.
    Watter, Scott
    [J]. CONNECTION SCIENCE, 2012, 24 (01) : 25 - 55
  • [4] Chinese Whispers: A Multimodal Dataset for Embodied Language Grounding
    Kontogiorgos, Dimosthenis
    Sibirtseva, Elena
    Gustafson, Joakim
    [J]. PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 743 - 749
  • [5] Object Graph Networks for Spatial Language Grounding
    Hawkins, Philip
    Maire, Frederic
    Denman, Simon
    Baktashmotlagh, Mahsa
    [J]. 2019 DIGITAL IMAGE COMPUTING: TECHNIQUES AND APPLICATIONS (DICTA), 2019, : 1 - 8
  • [6] Multimodal speech synthesis architecture for unsupervised speaker adaptation
    Hieu-Thi Luong
    Yamagishi, Junichi
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2494 - 2498
  • [7] MSRC: multimodal spatial regression with semantic context for phrase grounding
    Chen, Kan
    Kovvuri, Rama
    Gao, Jiyang
    Nevatia, Ram
    [J]. INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2018, 7 (01) : 17 - 28
  • [8] MSRC: Multimodal Spatial Regression with Semantic Context for Phrase Grounding
    Chen, Kan
    Kovvuri, Rama
    Gao, Jiyang
    Nevatia, Ram
    [J]. PROCEEDINGS OF THE 2017 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR'17), 2017, : 23 - 31
  • [9] MSRC: multimodal spatial regression with semantic context for phrase grounding
    Kan Chen
    Rama Kovvuri
    Jiyang Gao
    Ram Nevatia
    [J]. International Journal of Multimedia Information Retrieval, 2018, 7 : 17 - 28
  • [10] Combining Unsupervised and Supervised Learning for Sample Efficient Continuous Language Grounding
    Roesler, Oliver
    [J]. FRONTIERS IN ROBOTICS AND AI, 2022, 9