Combining CNN streams of dynamic image and depth data for action recognition

被引:17
|
作者
Singh, Roshan [1 ]
Khurana, Rajat [2 ]
Kushwaha, Alok Kumar Singh [2 ]
Srivastava, Rajeev [1 ]
机构
[1] IIT BHU, Dept Comp Sci & Engn, Varanasi, Uttar Pradesh, India
[2] IKG Punjab Tech Univ, Dept Comp Sci & Engn, Kapurthala, Punjab, India
关键词
Human activity recognition; RGB-D; CNN; VGG; Multi-stream CNN models; Transfer learning; ENSEMBLE;
D O I
10.1007/s00530-019-00645-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
RGB-D sensors have been in great demand due to its capability of producing large amount of multimodal data like RGB images and depth maps, useful for better training of deep learning models. In this paper, a deep learning model for recognizing human activities in a video sequence by combining multiple CNN streams has been proposed. The proposed work comprises the use of dynamic images generated from RGB images and depth map for three different dimensions. The proposed model is trained using these four streams on VGG Net for action recognition purpose. Further, it is evaluated and compared with the other state-of-the-art methods available in literature, on three challenging datasets, namely MSR daily Activity, UTD MHAD and CAD 60, in terms of accuracy, error, recall, specificity, precision and f-score. From obtained results, it has been observed that the proposed method outperforms other methods.
引用
收藏
页码:313 / 322
页数:10
相关论文
共 50 条
  • [21] Multi-channels CNN Temporal Features for Depth-based Action Recognition
    Jacek, Trelinski
    Kwolek, Bogdan
    TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
  • [22] Combining depth and colour data for 3D object recognition
    Jorgenson, TM
    Linneberg, C
    Andersen, AW
    INTELLIGENT ROBOTS AND COMPUTER VISION XVI: ALGORITHMS, TECHNIQUES, ACTIVE VISION, AND MATERIALS HANDLING, 1997, 3208 : 328 - 338
  • [23] A survey on deep neural networks for human action recognition in RGB image and depth image
    Wang, Hongyu
    ENERGY SCIENCE AND APPLIED TECHNOLOGY (ESAT 2016), 2016, : 697 - 703
  • [24] Finger vein recognition based on lightweight CNN combining center loss and dynamic regularization
    Zhao, Dongdong
    Ma, Hui
    Yang, Zedong
    Li, Jianian
    Tian, Wenbo
    INFRARED PHYSICS & TECHNOLOGY, 2020, 105
  • [25] Action recognition by fusing depth video and skeletal data information
    Ioannis Kapsouras
    Nikos Nikolaidis
    Multimedia Tools and Applications, 2019, 78 : 1971 - 1998
  • [26] Action recognition by fusing depth video and skeletal data information
    Kapsouras, Ioannis
    Nikolaidis, Nikos
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (02) : 1971 - 1998
  • [27] FUSION OF DEPTH, SKELETON, AND INERTIAL DATA FOR HUMAN ACTION RECOGNITION
    Chen, Chen
    Jafari, Roozbeh
    Kehtarnavazi, Nasser
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 2712 - 2716
  • [28] A Textured Object Recognition Pipeline for Color and Depth Image Data
    Tang, Jie
    Miller, Stephen
    Singh, Arjun
    Abbeel, Pieter
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 3467 - 3474
  • [29] Action recognition from depth sequence using depth motion maps-based local ternary patterns and CNN
    Li, Zhifei
    Zheng, Zhonglong
    Lin, Feilong
    Leung, Howard
    Li, Qing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (14) : 19587 - 19601
  • [30] Action recognition from depth sequence using depth motion maps-based local ternary patterns and CNN
    Zhifei Li
    Zhonglong Zheng
    Feilong Lin
    Howard Leung
    Qing Li
    Multimedia Tools and Applications, 2019, 78 : 19587 - 19601