Dynamically structuring, updating and interrelating representations of visual and linguistic discourse context

被引:18
|
作者
Kelleher, J [1 ]
Costello, F
van Genabith, J
机构
[1] Deutsch Forsch Zentrum Kunstl Intelligenz, Saarbrucken, Germany
[2] Univ Coll Dublin, Dublin 2, Ireland
[3] Dublin City Univ, Dublin 9, Ireland
关键词
visual salience; reference resolution; generating referring expressions; discourse context; cross-modal representations; synthetic vision;
D O I
10.1016/j.artint.2005.04.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The fundamental claim of this paper is that salience - both visual and linguistic - is an important overarching semantic category structuring visually situated discourse. Based on this we argue that computer systems attempting to model the evolving context of a visually situated discourse should integrate models of visual and linguistic salience within their natural language processing (NLP) framework. The paper highlights the importance of dynamically updating and interrelating visual and linguistic discourse context representations. To support our approach, we have developed a real-time, natural language virtual reality (NLVR) system (called LIVE, for Linguistic Interaction with Virtual Environments) that implements an NLP framework based on both visual and linguistic salience. Within this framework saliency information underpins two of the core subtasks of NLP: reference resolution and the generation of referring expressions. We describe the theoretical basis and architecture of the LIVE NLP framework and present extensive evaluation results comparing the system's performance with that of human participants in a number of experiments. (c) 2005 Elsevier B.V. All rights reserved.
引用
收藏
页码:62 / 102
页数:41
相关论文
共 50 条
  • [21] Visual Relationship Detection With Visual-Linguistic Knowledge From Multimodal Representations
    Chiou, Meng-Jiun
    Zimmermann, Roger
    Feng, Jiashi
    [J]. IEEE ACCESS, 2021, 9 : 50441 - 50451
  • [22] Perisaccadic Updating of Visual Representations and Attentional States: Linking Behavior and Neurophysiology
    Marino, Alexandria C.
    Mazer, James A.
    [J]. FRONTIERS IN SYSTEMS NEUROSCIENCE, 2016, 10
  • [23] CMSBERT-CLR: Context-driven Modality Shifting BERT with Contrastive Learning for linguistic, visual, acoustic Representations
    Kim, Junghun
    Kim, Jihie
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [24] Linguistic Representations of Motion Do Not Depend on the Visual Motion System
    Pavan, Andrea
    Baggio, Giosue
    [J]. PSYCHOLOGICAL SCIENCE, 2013, 24 (02) : 181 - 188
  • [25] Visual and Linguistic Representations of Places of Origin. An Interdisciplinary Analysis
    Violi, Patrizia
    [J]. VERSUS-QUADERNI DI STUDI SEMIOTICI, 2019, 48 (01): : 177 - 179
  • [26] Neural representations of auditory input accommodate to the context in a dynamically changing acoustic environment
    Rahne, Torsten
    Sussman, Elyse
    [J]. EUROPEAN JOURNAL OF NEUROSCIENCE, 2009, 29 (01) : 205 - 211
  • [27] Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features
    Du, Changde
    Fu, Kaicheng
    Li, Jinpeng
    He, Huiguang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 10760 - 10777
  • [28] Parallel processing in speech perception with local and global representations of linguistic context
    Brodbeck, Christian
    Bhattasali, Shohini
    Heredia, Aura Al Cruz
    Resnik, Philip
    Simon, Jonathan Z.
    Lau, Ellen
    [J]. ELIFE, 2022, 11
  • [29] A layered neural architecture for the consolidation, maintenance, and updating of representations in visual working memory
    Johnson, Jeffrey S.
    Spencer, John P.
    Schoener, Gregor
    [J]. BRAIN RESEARCH, 2009, 1299 : 17 - 32
  • [30] Processing symmetry between visual and auditory spatial representations in updating working memory
    Maezawa, Tomoki
    Kawahara, Jun, I
    [J]. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2023, 76 (03): : 672 - 704