A developmental where-what network for concurrent and interactive visual attention and recognition

被引：3

作者：

Ji, Zhengping ^{[1
]}

Weng, Juyang ^{[2
]}

机构：

[1] Samsung Semicond Inc, Adv Image Res Lab ARIL, Pasadena, CA 91103 USA

[2] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA

来源：

ROBOTICS AND AUTONOMOUS SYSTEMS | 2015年 / 71卷

关键词：

Developmental learning; Where-what sensorimotor pathways; Attention; Recognition; Brain-inspired neural network; MODEL; ALGORITHM; CORTEX; LAYERS;

D O I：

10.1016/j.robot.2015.03.004

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a brain-inspired developmental architecture called Where-What Network (WWN). In this second version of WWN, WWN-2 is learned for concurrent and interactive visual attention and recognition, via complementary pathways guided by "type" motor and "location" motor. The motor-driven top-down signals, together with bottom-up excitatory activities from the visual input, shape three possible information flows through a Y-shaped network. Using l(0) constrained sparse coding scheme, the top-down and bottom-up co-firing leads to a non-iterative cell-centered synaptic update model, entailing the strict entropy reduction from early to later layers, as well as a dual optimization of update directions and step sizes that dynamically depend on the firing ages of the neurons. Three operational modes for cluttered scenes emerge from the learning process, depending on what is available in the motor area: context-free mode for detection and recognition from a cluttered scene for a learned object, location-context mode for doing object recognition, and type-context mode for doing object search, all by a single network. To demonstrate the attention capabilities along with their interaction of visual processing, the proposed network is in the presence of complex backgrounds, learns on the fly, and produces engineering graded performance regarding attended pixel errors and recognition accuracy. As the proposed architecture is developmental, meaning that the internal representations are learned from pairs of input and motor signal, and thereby not manipulated internally for a specific task, we argue that the same learning principles and computational architecture can be potentially applicable to other sensory modalities, such as audition and touch. (C) 2015 Elsevier B.V. All rights reserved.

引用

页码：35 / 48

页数：14

共 50 条

[21] What and Where to See: Deep Attention Aggregation Network for Action Detection
He, Yuxuan
Gan, Ming-Gang
Liu, Xiaozhou
INTELLIGENT ROBOTICS AND APPLICATIONS (ICIRA 2022), PT I, 2022, 13455 : 177 - 187
[22] Speaker-Aware Interactive Graph Attention Network for Emotion Recognition in Conversation
Jia, Zhaohong
Shi, Yunwei
Liu, Weifeng
Huang, Zhenhua
Sun, Xiao
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (12)
[23] SIA-Net: Sparse Interactive Attention Network for Multimodal Emotion Recognition
Li, Shuzhen
Zhang, Tong
Chen, C. L. Philip
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, : 1 - 13
[24] Attention-based Pyramid Aggregation Network for Visual Place Recognition
Zhu, Yingying
Wang, Jiong
Xie, Lingxi
Zheng, Liang
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 99 - 107
[25] A NEURAL NETWORK MODEL FOR SELECTIVE ATTENTION IN VISUAL-PATTERN RECOGNITION
FUKUSHIMA, K
BIOLOGICAL CYBERNETICS, 1986, 55 (01) : 5 - 15
[26] Masked face recognition with convolutional visual self-attention network
Ge, Yiming
Liu, Hui
Du, Junzhao
Li, Zehua
Wei, Yuheng
NEUROCOMPUTING, 2023, 518 : 496 - 506
[27] Interactive activation in visual word recognition: Constraints imposed by the joint effects of spatial attention and semantics
Stolz, JA
Stevanovski, B
JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 2004, 30 (06) : 1064 - 1076
[28] Face recognition in video using a what-and-where fusion neural network
Barry, M.
Granger, E.
2007 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-6, 2007, : 2256 - 2261
[29] Visual smoke recognition based on an inverse-radiating attention pyramid network
Liu, Yuchen
Liu, Hongyan
Jiang, Yanlin
Wang, Mingxing
Wei, Liang
Gu, Ke
DISPLAYS, 2024, 84
[30] MAFormer: A transformer network with multi-scale attention fusion for visual recognition
Sun, Huixin
Wang, Yunhao
Wang, Xiaodi
Zhang, Bin
Xin, Ying
Zhang, Baochang
Cao, Xianbin
Ding, Errui
Han, Shumin
NEUROCOMPUTING, 2024, 595

← 1 2 3 4 5 →