Self-supervised multimodal fusion transformer for passive activity recognition

被引:4
|
作者
Koupai, Armand K. [1 ]
Bocus, Mohammud J. [1 ]
Santos-Rodriguez, Raul [1 ]
Piechocki, Robert J. [1 ]
McConville, Ryan [1 ]
机构
[1] Univ Bristol, Sch Comp Sci Elect & Elect Engn & Engn Maths, Bristol, Avon, England
基金
英国工程与自然科学研究理事会;
关键词
deep learning; multi modal/sensor fusion; passive WiFi-based HAR; self-supervised learning; vision transformer (ViT); WI-FI; GESTURE; CSI;
D O I
10.1049/wss2.12044
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this study, new properties of the Transformer architecture for multimodal sensor fusion are explored. Different signal processing techniques are used to extract multiple image-based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). The Fusion Transformer, an attention-based model for multimodal and multisensor fusion is first proposed. Experimental results show that the Fusion Transformer approach can achieve competitive results compared to a ResNet architecture but with much fewer resources. To further improve the model, a simple and effective framework for multimodal and multi-sensor self-supervised learning (SSL) is proposed. The self-supervised Fusion Transformer outperforms the baselines, achieving a macro F1-score of 95.9%. Finally, this study shows how this approach significantly outperforms the others when trained with as little as 1% (2 min) of labelled training data to 20% (40 min) of labelled training data.
引用
收藏
页码:149 / 160
页数:12
相关论文
共 50 条
  • [31] TFDEPTH: SELF-SUPERVISED MONOCULARDEPTH ESTIMATION WITH MULITI-SCALE SELECTIVE TRANSFORMER FEATURE FUSION
    Hu, Hongli
    Miao, Jun
    Zhu, Guanghu
    Yan, Je
    Chu, Jun
    IMAGE ANALYSIS & STEREOLOGY, 2024, 43 (02): : 139 - 149
  • [32] Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
    Valada, Abhinav
    Mohan, Rohit
    Burgard, Wolfram
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2020, 128 (05) : 1239 - 1285
  • [33] MST: Masked Self-Supervised Transformer for Visual Representation
    Li, Zhaowen
    Chen, Zhiyang
    Yang, Fan
    Li, Wei
    Zhu, Yousong
    Zhao, Chaoyang
    Deng, Rui
    Wu, Liwei
    Zhao, Rui
    Tang, Ming
    Wang, Jinqiao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [34] Self-Supervised RGB-NIR Fusion Video Vision Transformer Framework for rPPG Estimation
    Park, Soyeon
    Kim, Bo-Kyeong
    Dong, Suh-Yeon
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [35] Self-supervised Hypergraph Transformer with Alignment and Uniformity for Recommendation
    Yang, XianFeng
    Liu, Yang
    IAENG International Journal of Computer Science, 2024, 51 (03) : 292 - 300
  • [36] Self-Supervised Pretraining Transformer for Seismic Data Denoising
    Wang, Hongzhou
    Lin, Jun
    Li, Yue
    Dong, Xintong
    Tong, Xunqian
    Lu, Shaoping
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 25
  • [37] Transformer-based correlation mining network with self-supervised label generation for multimodal sentiment analysis
    Wang, Ruiqing
    Yang, Qimeng
    Tian, Shengwei
    Yu, Long
    He, Xiaoyu
    Wang, Bo
    Neurocomputing, 2025, 618
  • [38] A Self-Supervised Transformer With Feature Fusion for SAR Image Semantic Segmentation in Marine Aquaculture Monitoring
    Fan, Jianchao
    Zhou, Jianlin
    Wang, Xinzhe
    Wang, Jun
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [39] CTFusion: CNN-transformer-based self-supervised learning for infrared and visible image fusion
    Du, Keying
    Fang, Liuyang
    Chen, Jie
    Chen, Dongdong
    Lai, Hua
    Mathematical Biosciences and Engineering, 2024, 21 (07) : 6710 - 6730
  • [40] Learning Self-Supervised Multimodal Representations of Human Behaviour
    Shukla, Abhinav
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4748 - 4751