A developmental where-what network for concurrent and interactive visual attention and recognition

被引:3
|
作者
Ji, Zhengping [1 ]
Weng, Juyang [2 ]
机构
[1] Samsung Semicond Inc, Adv Image Res Lab ARIL, Pasadena, CA 91103 USA
[2] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
关键词
Developmental learning; Where-what sensorimotor pathways; Attention; Recognition; Brain-inspired neural network; MODEL; ALGORITHM; CORTEX; LAYERS;
D O I
10.1016/j.robot.2015.03.004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a brain-inspired developmental architecture called Where-What Network (WWN). In this second version of WWN, WWN-2 is learned for concurrent and interactive visual attention and recognition, via complementary pathways guided by "type" motor and "location" motor. The motor-driven top-down signals, together with bottom-up excitatory activities from the visual input, shape three possible information flows through a Y-shaped network. Using l(0) constrained sparse coding scheme, the top-down and bottom-up co-firing leads to a non-iterative cell-centered synaptic update model, entailing the strict entropy reduction from early to later layers, as well as a dual optimization of update directions and step sizes that dynamically depend on the firing ages of the neurons. Three operational modes for cluttered scenes emerge from the learning process, depending on what is available in the motor area: context-free mode for detection and recognition from a cluttered scene for a learned object, location-context mode for doing object recognition, and type-context mode for doing object search, all by a single network. To demonstrate the attention capabilities along with their interaction of visual processing, the proposed network is in the presence of complex backgrounds, learns on the fly, and produces engineering graded performance regarding attended pixel errors and recognition accuracy. As the proposed architecture is developmental, meaning that the internal representations are learned from pairs of input and motor signal, and thereby not manipulated internally for a specific task, we argue that the same learning principles and computational architecture can be potentially applicable to other sensory modalities, such as audition and touch. (C) 2015 Elsevier B.V. All rights reserved.
引用
收藏
页码:35 / 48
页数:14
相关论文
共 50 条
  • [21] What and Where to See: Deep Attention Aggregation Network for Action Detection
    He, Yuxuan
    Gan, Ming-Gang
    Liu, Xiaozhou
    INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT I, 2022, 13455 : 177 - 187
  • [22] Speaker-Aware Interactive Graph Attention Network for Emotion Recognition in Conversation
    Jia, Zhaohong
    Shi, Yunwei
    Liu, Weifeng
    Huang, Zhenhua
    Sun, Xiao
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (12)
  • [23] SIA-Net: Sparse Interactive Attention Network for Multimodal Emotion Recognition
    Li, Shuzhen
    Zhang, Tong
    Chen, C. L. Philip
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 1 - 13
  • [24] Attention-based Pyramid Aggregation Network for Visual Place Recognition
    Zhu, Yingying
    Wang, Jiong
    Xie, Lingxi
    Zheng, Liang
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 99 - 107
  • [26] Masked face recognition with convolutional visual self-attention network
    Ge, Yiming
    Liu, Hui
    Du, Junzhao
    Li, Zehua
    Wei, Yuheng
    NEUROCOMPUTING, 2023, 518 : 496 - 506
  • [27] Interactive activation in visual word recognition: Constraints imposed by the joint effects of spatial attention and semantics
    Stolz, JA
    Stevanovski, B
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2004, 30 (06) : 1064 - 1076
  • [28] Face recognition in video using a what-and-where fusion neural network
    Barry, M.
    Granger, E.
    2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 2256 - 2261
  • [29] Visual smoke recognition based on an inverse-radiating attention pyramid network
    Liu, Yuchen
    Liu, Hongyan
    Jiang, Yanlin
    Wang, Mingxing
    Wei, Liang
    Gu, Ke
    DISPLAYS, 2024, 84
  • [30] MAFormer: A transformer network with multi-scale attention fusion for visual recognition
    Sun, Huixin
    Wang, Yunhao
    Wang, Xiaodi
    Zhang, Bin
    Xin, Ying
    Zhang, Baochang
    Cao, Xianbin
    Ding, Errui
    Han, Shumin
    NEUROCOMPUTING, 2024, 595