Invariant recognition drives neural representations of action sequences

被引:8
|
作者
Tacchetti A. [1 ]
Isik L. [1 ]
Poggio T. [1 ]
机构
[1] Center for Brains Minds and Machines, Massachusetts Institute of Technology, Cambridge, MA
来源
Tacchetti, Andrea (atacchet@mit.edu) | 1600年 / Public Library of Science卷 / 13期
基金
美国国家科学基金会;
关键词
Action recognition - Action sequences - Complex transformations - Convolutional neural network - Human perception - Invariant recognition - Neural representations - Social cues - Visual intelligence - Visual stimulus;
D O I
10.1371/journal.pcbi.1005859
中图分类号
学科分类号
摘要
Recognizing the actions of others from visual stimuli is a crucial aspect of human perception that allows individuals to respond to social cues. Humans are able to discriminate between similar actions despite transformations, like changes in viewpoint or actor, that substantially alter the visual appearance of a scene. This ability to generalize across complex transformations is a hallmark of human visual intelligence. Advances in understanding action recognition at the neural level have not always translated into precise accounts of the computational principles underlying what representations of action sequences are constructed by human visual cortex. Here we test the hypothesis that invariant action discrimination might fill this gap. Recently, the study of artificial systems for static object perception has produced models, Convolutional Neural Networks (CNNs), that achieve human level performance in complex discriminative tasks. Within this class, architectures that better support invariant object recognition also produce image representations that better match those implied by human and primate neural data. However, whether these models produce representations of action sequences that support recognition across complex transformations and closely follow neural representations of actions remains unknown. Here we show that spatiotemporal CNNs accurately categorize video stimuli into action classes, and that deliberate model modifications that improve performance on an invariant action recognition task lead to data representations that better match human neural recordings. Our results support our hypothesis that performance on invariant discrimination dictates the neural representations of actions computed in the brain. These results broaden the scope of the invariant recognition framework for understanding visual intelligence from perception of inanimate objects and faces in static images to the study of human perception of action sequences. © 2017 Tacchetti et al.
引用
收藏
相关论文
共 50 条
  • [1] Neural representations that support invariant object recognition
    Goris, Robbe L. T.
    Op de Beeck, Hans P.
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2009, 3
  • [2] Invariant Recognition Shapes Neural Representations of Visual Input
    Tacchetti, Andrea
    Isik, Leyla
    Poggio, Tomaso A.
    ANNUAL REVIEW OF VISION SCIENCE, VOL 4, 2018, 4 : 403 - 422
  • [3] OBJECT RECOGNITION USING INVARIANT OBJECT BOUNDARY REPRESENTATIONS AND NEURAL NETWORK MODELS
    BEBIS, GN
    PAPADOURAKIS, GM
    PATTERN RECOGNITION, 1992, 25 (01) : 25 - 44
  • [4] View-Invariant Action Recognition Based on Artificial Neural Networks
    Iosifidis, Alexandros
    Tefas, Anastasios
    Pitas, Ioannis
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2012, 23 (03) : 412 - 424
  • [5] On property of the invariant of graphical representations of DNA sequences
    Chun-xin Yuan
    Li-wei Liu
    Tian-ming Wang
    Chun Li
    Journal of Mathematical Chemistry, 2008, 43 : 1177 - 1183
  • [6] Neural representations for action
    Decety, J
    REVIEWS IN THE NEUROSCIENCES, 1996, 7 (04) : 285 - 297
  • [7] Learning shape and motion representations for view invariant skeleton-based action recognition
    Li, Yanshan
    Xia, Rongjie
    Liu, Xing
    PATTERN RECOGNITION, 2020, 103 (103)
  • [8] Hierarchically Learned View-Invariant Representations for Cross-View Action Recognition
    Liu, Yang
    Lu, Zhaoyang
    Li, Jing
    Yang, Tao
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (08) : 2416 - 2430
  • [9] Learning View-invariant Sparse Representations for Cross-view Action Recognition
    Zheng, Jingjing
    Jiang, Zhuolin
    2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, : 3176 - 3183
  • [10] On property of the invariant of graphical representations of DNA sequences
    Yuan, Chun-xin
    Liu, Li-wei
    Wang, Tian-ming
    Li, Chun
    JOURNAL OF MATHEMATICAL CHEMISTRY, 2008, 43 (03) : 1177 - 1183