Self-supervised multimodal fusion transformer for passive activity recognition

被引:4
|
作者
Koupai, Armand K. [1 ]
Bocus, Mohammud J. [1 ]
Santos-Rodriguez, Raul [1 ]
Piechocki, Robert J. [1 ]
McConville, Ryan [1 ]
机构
[1] Univ Bristol, Sch Comp Sci Elect & Elect Engn & Engn Maths, Bristol, Avon, England
基金
英国工程与自然科学研究理事会;
关键词
deep learning; multi modal/sensor fusion; passive WiFi-based HAR; self-supervised learning; vision transformer (ViT); WI-FI; GESTURE; CSI;
D O I
10.1049/wss2.12044
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this study, new properties of the Transformer architecture for multimodal sensor fusion are explored. Different signal processing techniques are used to extract multiple image-based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). The Fusion Transformer, an attention-based model for multimodal and multisensor fusion is first proposed. Experimental results show that the Fusion Transformer approach can achieve competitive results compared to a ResNet architecture but with much fewer resources. To further improve the model, a simple and effective framework for multimodal and multi-sensor self-supervised learning (SSL) is proposed. The self-supervised Fusion Transformer outperforms the baselines, achieving a macro F1-score of 95.9%. Finally, this study shows how this approach significantly outperforms the others when trained with as little as 1% (2 min) of labelled training data to 20% (40 min) of labelled training data.
引用
收藏
页码:149 / 160
页数:12
相关论文
共 50 条
  • [21] Self-Supervised Graph Transformer for Deepfake Detection
    Khormali, Aminollah
    Yuan, Jiann-Shiun
    IEEE ACCESS, 2024, 12 : 58114 - 58127
  • [22] SFT: Few-Shot Learning via Self-Supervised Feature Fusion With Transformer
    Lim, Jit Yan
    Lim, Kian Ming
    Lee, Chin Poo
    Tan, Yong Xuan
    IEEE ACCESS, 2024, 12 : 86690 - 86703
  • [23] Multi Self-Supervised Pre-Finetuned Transformer Fusion for Better Vehicle Detection
    Zheng, Juwu
    Ren, Jiangtao
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, : 1 - 15
  • [24] Surgical Gesture Recognition in Laparoscopic Tasks Based on the Transformer Network and Self-Supervised Learning
    Gazis, Athanasios
    Karaiskos, Pantelis
    Loukas, Constantinos
    BIOENGINEERING-BASEL, 2022, 9 (12):
  • [25] Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
    Pan, Xichen
    Chen, Peiyu
    Gong, Yichen
    Zhou, Helong
    Wang, Xinbing
    Lin, Zhouhan
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 4491 - 4503
  • [26] SelfAct : Personalized Activity Recognition Based on Self-Supervised and Active Learning
    Arrotta, Luca
    Civitarese, Gabriele
    Bettini, Claudio
    MOBILE AND UBIQUITOUS SYSTEMS: COMPUTING, NETWORKING AND SERVICES, MOBIQUITOUS 2023, PT I, 2024, 593 : 375 - 391
  • [27] Assessing the State of Self-Supervised Human Activity Recognition Using Wearables
    Haresamudram, Harish
    Essa, Irfan
    Plotz, Thomas
    PROCEEDINGS OF THE ACM ON INTERACTIVE MOBILE WEARABLE AND UBIQUITOUS TECHNOLOGIES-IMWUT, 2022, 6 (03):
  • [28] Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial Networks
    Zadeh, Mohammad Zaki
    Babu, Ashwin Ramesh
    Jaiswal, Ashish
    Kyrarini, Maria
    Makedon, Fillia
    THE 14TH ACM INTERNATIONAL CONFERENCE ON PERVASIVE TECHNOLOGIES RELATED TO ASSISTIVE ENVIRONMENTS, PETRA 2021, 2021, : 171 - 176
  • [29] MULTIMODAL TRANSFORMER FUSION FOR CONTINUOUS EMOTION RECOGNITION
    Huang, Jian
    Tao, Jianhua
    Liu, Bin
    Lian, Zheng
    Niu, Mingyue
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3507 - 3511
  • [30] Self-supervised modal optimization transformer for image captioning
    Wang, Ye
    Li, Daitianxia
    Liu, Qun
    Liu, Li
    Wang, Guoyin
    Neural Computing and Applications, 2024, 36 (31) : 19863 - 19878