A Multimodal Connectionist Architecture for Unsupervised Grounding of Spatial Language

被引：10

作者：

Vavrecka, Michal ^{[1
]}

Farkas, Igor ^{[2
]}

机构：

[1] Czech Tech Univ, Dept Cybernet, CR-16635 Prague, Czech Republic

[2] Comenius Univ, Dept Appl Informat, Bratislava 84248, Slovakia

来源：

COGNITIVE COMPUTATION | 2014年 / 6卷 / 01期

关键词：

Unsupervised learning; Self-organizing map; Symbol grounding; Spatial phrases; Multimodal representations; SELF-ORGANIZING NETWORK; WORDS; MODEL;

D O I：

10.1007/s12559-013-9212-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose a bio-inspired unsupervised connectionist architecture and apply it to grounding the spatial phrases. The two-layer architecture combines by concatenation the information from the visual and the phonological inputs. In the first layer, the visual pathway employs separate 'what' and 'where' subsystems that represent the identity and spatial relations of two objects in 2D space, respectively. The bitmap images are presented to an artificial retina and the phonologically encoded five-word sentences describing the image serve as the phonological input. The visual scene is hence represented by several self-organizing maps (SOMs) and the phonological description is processed by the Recursive SOM that learns to topographically represent the spatial phrases, represented as five-word sentences (e.g., 'blue ball above red cup'). Primary representations from the first-layer modules are unambiguously integrated in a multimodal second-layer module, implemented by the SOM or the 'neural gas' algorithms. The system learns to bind proper lexical and visual features without any prior knowledge. The simulations reveal that separate processing and representation of the spatial location and the object shape significantly improve the performance of the model. We provide quantitative experimental results comparing three models in terms of their accuracy.

引用

页码：101 / 112

页数：12

共 50 条

[41] CONNECTIONIST APPROACHES TO LANGUAGE DISORDERS
HARLEY, TA
[J]. APHASIOLOGY, 1993, 7 (03) : 221 - 249
[42] Visual bootstrapping for unsupervised symbol grounding
Kittler, Josef
Shevchenko, Mikhail
Windridge, David
[J]. ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PROCEEDINGS, 2006, 4179 : 1037 - 1046
[43] Language of Thought: The Connectionist Contribution
Murat Aydede
[J]. Minds and Machines, 1997, 7 : 57 - 101
[44] Unsupervised multimodal processing
Nyamapfene, Abel
Ahmad, Khurshid
[J]. PROCEEDINGS OF THE IASTED INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND APPLICATIONS, 2007, : 14 - +
[45] Unsupervised connectionist network for fault diagnosis of helicopter gearboxes
Jammu, VB
Danai, K
Lewicki, DG
[J]. AMERICAN HELICOPTER SOCIETY - 53RD ANNUAL FORUM PROCEEDINGS, VOLS 1 AND 2, 1997, : 1297 - 1307
[46] A taxonomy for spatiotemporal connectionist networks revisited:: The unsupervised case
Barreto, GD
Araújo, AFR
[J]. NEURAL COMPUTATION, 2003, 15 (06) : 1255 - 1320
[47] Efficient Grounding of Abstract Spatial Concepts for Natural Language Interaction with Robot Manipulators
Paul, Rohan
Arkin, Jacob
Roy, Nicholas
Howard, Thomas M.
[J]. ROBOTICS: SCIENCE AND SYSTEMS XII, 2016,
[48] Efficient grounding of abstract spatial concepts for natural language interaction with robot platforms
Paul, Rohan
Arkin, Jacob
Aksaray, Derya
Roy, Nicholas
Howard, Thomas M.
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2018, 37 (10): : 1269 - 1299
[49] Language Conditioned Spatial Relation Reasoning for 3D Object Grounding
Chen, Shizhe
Guhur, Pierre-Louis
Tapaswi, Makarand
Schmid, Cordelia
Laptev, Ivan
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[50] GROUNDING LANGUAGE IN PERCEPTION
SISKIND, JM
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 1995, 8 (5-6) : 371 - 391

← 1 2 3 4 5 →