Learning by Aligning 2D Skeleton Sequences and Multi-modality Fusion

被引:0
|
作者
Tran, Quoc-Huy [1 ]
Ahmed, Muhammad [1 ]
Popattia, Murad [1 ]
Ahmed, M. Hassan [1 ]
Konin, Andrey [1 ]
Zia, M. Zeeshan [1 ]
机构
[1] Retrocausal Inc, Redmond, WA 98052 USA
来源
关键词
Temporal video alignment; Temporal 2D skeleton sequence alignment; Multi-modality fusion; Self-supervised learning;
D O I
10.1007/978-3-031-72973-7_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a self-supervised temporal video alignment framework which is useful for several fine-grained human activity understanding applications. In contrast with the state-of-the-art method of CASA, where sequences of 3D skeleton coordinates are taken directly as input, our key idea is to use sequences of 2D skeleton heatmaps as input. Unlike CASA which performs self-attention in the temporal domain only, we feed 2D skeleton heatmaps to a video transformer which performs self-attention both in the spatial and temporal domains for extracting effective spatiotemporal and contextual features. In addition, we introduce simple heatmap augmentation techniques based on 2D skeletons for self-supervised learning. Despite the lack of 3D information, our approach achieves not only higher accuracy but also better robustness against missing and noisy keypoints than CASA. Furthermore, extensive evaluations on three public datasets, i.e., Penn Action, IKEA ASM, and H2O, demonstrate that our approach outperforms previous methods in different fine-grained human activity understanding tasks. Finally, fusing 2D skeleton heatmaps with RGB videos yields the state-of-the-art on all metrics and datasets. To our best knowledge, our work is the first to utilize 2D skeleton heatmap inputs and the first to explore multi-modality fusion for temporal video alignment.
引用
收藏
页码:141 / 161
页数:21
相关论文
共 50 条
  • [1] Multi-modality fusion learning for the automatic diagnosis of optic neuropathy
    Cao, Zheng
    Sun, Chuanbin
    Wang, Wenzhe
    Zheng, Xiangshang
    Wu, Jian
    Gao, Honghao
    PATTERN RECOGNITION LETTERS, 2021, 142 : 58 - 64
  • [2] A deep learning system for automated, multi-modality 2D segmentation of vertebral bodies and intervertebral discs
    Suri, Abhinav
    Jones, Brandon C.
    Ng, Grace
    Anabaraonye, Nancy
    Beyrer, Patrick
    Domi, Albi
    Choi, Grace
    Tang, Sisi
    Terry, Ashley
    Leichner, Thomas
    Fathali, Iman
    Bastin, Nikita
    Chesnais, Helene
    Rajapakse, Chamith S.
    BONE, 2021, 149
  • [3] Equivariant Multi-Modality Image Fusion
    Zhao, Zixiang
    Hai, Haowen
    Zhang, Jiangshe
    Zhang, Yulun
    Zhane, Kai
    Xu, Shuang
    Chen, Dongdong
    Timofte, Radu
    Van Gool, Luc
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 25912 - 25921
  • [4] Deep learning supported disease detection with multi-modality image fusion
    Vinnarasi, F. Sangeetha Francelin
    Daniel, Jesline
    Rose, J. T. Anita
    Pugalenthi, R.
    JOURNAL OF X-RAY SCIENCE AND TECHNOLOGY, 2021, 29 (03) : 411 - 434
  • [5] AWDF: An Adaptive Weighted Deep Fusion Architecture for Multi-modality Learning
    Xue, Qinghan
    Kolagunda, Abhishek
    Eliuk, Steven
    Wang, Xiaolong
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2503 - 2512
  • [6] A novel dictionary learning approach for multi-modality medical image fusion
    Zhu, Zhiqin
    Chai, Yi
    Yin, Hongpeng
    Li, Yanxia
    Liu, Zhaodong
    NEUROCOMPUTING, 2016, 214 : 471 - 482
  • [7] Multi-modality sensor fusion for gait classification using deep learning
    Yunas, Syed Usama
    Alharthi, Abdullah
    Ozanyan, Krikor B.
    2020 IEEE SENSORS APPLICATIONS SYMPOSIUM (SAS 2020), 2020,
  • [8] Skeleton Sequence and RGB Frame Based Multi-Modality Feature Fusion Network for Action Recognition
    Zhu, Xiaoguang
    Zhu, Ye
    Wang, Haoyu
    Wen, Honglin
    Yan, Yan
    Liu, Peilin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (03)
  • [9] Machine Learning Multi-Modality Fusion Approaches Outperform Single-Modality & Traditional Approaches
    Garagic, Denis
    Pelgrift, Daniel
    Peskoe, Jacob
    Hagan, Ronald D.
    Zulch, Peter
    Rhodes, Bradley J.
    2021 IEEE AEROSPACE CONFERENCE (AEROCONF 2021), 2021,
  • [10] Multi-modality Fusion Network for Action Recognition
    Huang, Kai
    Qin, Zheng
    Xu, Kaiping
    Ye, Shuxiong
    Wang, Guolong
    ADVANCES IN MULTIMEDIA INFORMATION PROCESSING - PCM 2017, PT II, 2018, 10736 : 139 - 149