Multimodal Score Fusion with Sparse Low-rank Bilinear Pooling for Egocentric Hand Action Recognition

被引:0
|
作者
Roy, Kankana [1 ,2 ]
机构
[1] Karolinska Inst, Dept Oncol Pathol, S-17177 Stockholm, Sweden
[2] Indian Inst Technol Kharagpur, Dept Comp Sci & Engn, Kharagpur 721302, West Bengal, India
关键词
Bilinear score pooling; egocentric hand action recognition; RGB-D videos; sparse; low rank; CNN; RNN; NEURAL-NETWORKS;
D O I
10.1145/3656044
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the advent of egocentric cameras, there are new challenges where traditional computer vision is not sufficient to handle this kind of video. Moreover, egocentric cameras often offer multiple modalities that need to be modeled jointly to exploit complimentary information. In this article, we propose a sparse low-rank bilinear score pooling approach for egocentric hand action recognition from RGB-D videos. It consists of five blocks: a baseline CNN to encode RGB and depth information for producing classification probabilities; a novel bilinear score pooling block to generate a score matrix; a sparse low-rank matrix recovery block to reduce redundant features, which is common in bilinear pooling; a one-layer CNN for frame-level classification; and an RNN for video-level classification. We proposed to fuse classification probabilities instead of traditional CNN features from RGB and depth modality, involving an effective yet simple sparse low-rank bilinear score pooling to produce a fused RGB-D score matrix. To demonstrate the efficacy of our method, we perform extensive experiments over two large-scale hand action datasets, namely, THU-READ and FPHA, and two smaller datasets, GUN-71 and HAD. We observe that the proposed method outperforms state-of-the-art methods and achieves accuracies of 78.55% and 96.87% over the THU-READ dataset in cross-subject and cross-group settings, respectively. Further, we achieved accuracies of 91.59% and 43.87% over the FPHA and Gun-71 datasets, respectively.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Robust Multimodal Recognition via Multitask Multivariate Low-Rank Representations
    Zhang, Heng
    Patel, Vishal M.
    Chellappa, Rama
    2015 11TH IEEE INTERNATIONAL CONFERENCE AND WORKSHOPS ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG), VOL. 1, 2015,
  • [22] SPARSE LOW-RANK COMPONENT CODING FOR FACE RECOGNITION WITH ILLUMINATION AND CORRUPTION
    Yang, Shicheng
    Wen, Ying
    He, Lianghua
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 1693 - 1697
  • [23] Low-Rank and Joint Sparse Representations for Multi-Modal Recognition
    Zhang, Heng
    Patel, Vishal M.
    Chellappa, Rama
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (10) : 4741 - 4752
  • [24] Improved sparse representation with low-rank representation for robust face recognition
    Zheng, Chun-Hou
    Hou, Yi-Fu
    Zhang, Jun
    NEUROCOMPUTING, 2016, 198 : 114 - 124
  • [25] Sparse representation for face recognition by discriminative low-rank matrix recovery
    Chen, Jie
    Yi, Zhang
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (05) : 763 - 773
  • [26] Space Squeeze Reasoning and Low-Rank Bilinear Feature Fusion for Surgical Image Segmentation
    Ni, Zhen-Liang
    Bian, Gui-Bin
    Li, Zhen
    Zhou, Xiao-Hu
    Li, Rui-Qi
    Hou, Zeng-Guang
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (07) : 3209 - 3217
  • [27] Learning sparse discriminant low-rank features for low-resolution face recognition
    Shakeel, M. Saad
    Lam, Kin-Man
    Lai, Shun-Cheung
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2019, 63
  • [28] Multimodal Medical Image Fusion Based on Multiple Latent Low-Rank Representation
    Lou, Xi-Cheng
    Feng, Xin
    COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE, 2021, 2021
  • [29] Fusion of Hyperspectral and LiDAR Data Using Sparse and Low-Rank Component Analysis
    Rasti, Behnood
    Ghamisi, Pedram
    Plaza, Javier
    Plaza, Antonio
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2017, 55 (11): : 6354 - 6365
  • [30] Joint low-rank and sparse decomposition for infrared and visible image sequence fusion
    Wang, Wenqing
    Zhang, Jiqian
    Liu, Han
    Xiong, Wei
    Zhang, Chunli
    INFRARED PHYSICS & TECHNOLOGY, 2023, 133