A deep multimodal network based on bottleneck layer features fusion for action recognition

被引:9
|
作者
Singh, Tej [1 ]
Vishwakarma, Dinesh Kumar [1 ]
机构
[1] Delhi Technol Univ, Dept Informat Technol, Biometr Res Lab, Delhi 110042, India
关键词
Human Activity Recognition (HAR); Deep learning; DCA; SVM; LEVEL FUSION; SKELETON; DEPTH; REPRESENTATION; INFORMATION; DESCRIPTOR; JOINTS;
D O I
10.1007/s11042-021-11415-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human Activity Recognition (HAR) in videos using convolution neural network become the preferred choice for researcher due to the tremendous success of deep learning models for visual recognition applications. After the invention of the low-cost depth sensor, multiple modalities based activity recognition systems were successfully developed in the past decade. Although it is always challenging to recognize the complex human activities in videos. In this work, we proposed a deep bottleneck multimodal feature fusion (D-BMFF) framework that fused three different modalities of RGB, RGB-D(depth) and 3D coordinates information for activity classification. It helps to better recognize and make full use of information available simultaneously from a depth sensor. During the training process RGB and depth, frames are fed at regular intervals for an activity video while 3D coordinates are first converted into single RGB skeleton motion history image (RGB-SklMHI). We have extracted the features from multimodal data inputs using the latest deep pre-trained network architecture. The multimodal feature obtained from bottleneck layers before the top layer is fused by using multiset discriminant correlation analysis (M-DCA), which allows for robust visual action modeling. Finally, using a linear multiclass support vector machine (SVM) method, the fused features are categorized. The proposed approach is evaluated over four standard RGB-D datasets: UT-Kinect, CAD-60, Florence 3D and SBU Interaction. Our framework produces outstanding results and outperformed the state-of-the-art methods.
引用
收藏
页码:33505 / 33525
页数:21
相关论文
共 50 条
  • [1] A deep multimodal network based on bottleneck layer features fusion for action recognition
    Tej Singh
    Dinesh Kumar Vishwakarma
    [J]. Multimedia Tools and Applications, 2021, 80 : 33505 - 33525
  • [2] Deep learning network model based on fusion of spatiotemporal features for action recognition
    Ge Yang
    Wu-xing Zou
    [J]. Multimedia Tools and Applications, 2022, 81 : 9875 - 9896
  • [3] Deep learning network model based on fusion of spatiotemporal features for action recognition
    Yang, Ge
    Zou, Wu-xing
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (07) : 9875 - 9896
  • [4] Deep Neural Network Bottleneck Features for Acoustic Event Recognition
    Mun, Seongkyu
    Shon, Suwon
    Kim, Wooil
    Ko, Hanseok
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2954 - 2957
  • [5] Diverse Features Fusion Network for video-based action recognition
    Deng, Haoyang
    Kong, Jun
    Jiang, Min
    Liu, Tianshan
    [J]. JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2021, 77
  • [6] Hybrid features for skeleton-based action recognition based on network fusion
    Chen, Zhangmeng
    Pan, Junjun
    Yang, Xiaosong
    Qin, Hong
    [J]. COMPUTER ANIMATION AND VIRTUAL WORLDS, 2020, 31 (4-5)
  • [7] Finger Multimodal Features Fusion and Recognition Based on CNN
    Wang, Li
    Zhang, Haigang
    Yang, Jingfeng
    [J]. 2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 3183 - 3188
  • [8] A Deep Reinforcement Learning Method For Multimodal Data Fusion in Action Recognition
    Guo, Jiale
    Liu, Qiang
    Chen, Enqing
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 120 - 124
  • [9] Human Action Recognition Based on Fusion Features
    Yang, Shiqiang
    Yang, Jiangtao
    Li, Fei
    Fan, Guohao
    Li, Dexin
    [J]. CYBER SECURITY INTELLIGENCE AND ANALYTICS, 2020, 928 : 569 - 579
  • [10] Facial Expression Recognition Based on Fusion of Local Features and Deep Belief Network
    Wang Linlin
    Liu Jinghao
    Fu Xiaomei
    [J]. LASER & OPTOELECTRONICS PROGRESS, 2018, 55 (01)