Self-supervised multimodal fusion transformer for passive activity recognition

被引：4

作者：

Koupai, Armand K. ^{[1
]}

Bocus, Mohammud J. ^{[1
]}

Santos-Rodriguez, Raul ^{[1
]}

Piechocki, Robert J. ^{[1
]}

McConville, Ryan ^{[1
]}

机构：

[1] Univ Bristol, Sch Comp Sci Elect & Elect Engn & Engn Maths, Bristol, Avon, England

来源：

IET WIRELESS SENSOR SYSTEMS | 2022年 / 12卷 / 5-6期

基金：

英国工程与自然科学研究理事会;

关键词：

deep learning; multi modal/sensor fusion; passive WiFi-based HAR; self-supervised learning; vision transformer (ViT); WI-FI; GESTURE; CSI;

D O I：

10.1049/wss2.12044

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this study, new properties of the Transformer architecture for multimodal sensor fusion are explored. Different signal processing techniques are used to extract multiple image-based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). The Fusion Transformer, an attention-based model for multimodal and multisensor fusion is first proposed. Experimental results show that the Fusion Transformer approach can achieve competitive results compared to a ResNet architecture but with much fewer resources. To further improve the model, a simple and effective framework for multimodal and multi-sensor self-supervised learning (SSL) is proposed. The self-supervised Fusion Transformer outperforms the baselines, achieving a macro F1-score of 95.9%. Finally, this study shows how this approach significantly outperforms the others when trained with as little as 1% (2 min) of labelled training data to 20% (40 min) of labelled training data.

引用

页码：149 / 160

页数：12

共 50 条

[41] Multimodal Self-supervised Learning for Medical Image Analysis
Taleb, Aiham
Lippert, Christoph
Klein, Tassilo
Nabi, Moin
INFORMATION PROCESSING IN MEDICAL IMAGING, IPMI 2021, 2021, 12729 : 661 - 673
[42] Self-Supervised Rigid Registration for Multimodal Retinal Images
An, Cheolhong
Wang, Yiqian
Zhang, Junkang
Nguyen, Truong Q.
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 5733 - 5747
[43] Self-Supervised Model Adaptation for Multimodal Semantic Segmentation
Abhinav Valada
Rohit Mohan
Wolfram Burgard
International Journal of Computer Vision, 2020, 128 : 1239 - 1285
[44] Isotropic Self-Supervised Learning for Driver Drowsiness Detection With Attention-Based Multimodal Fusion
Mou, Luntian
Zhou, Chao
Xie, Pengtao
Zhao, Pengfei
Jain, Ramesh
Gao, Wen
Yin, Baocai
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 529 - 542
[45] Self-Supervised Hypergraph Learning for Enhanced Multimodal Representation
Shu, Hongji
Meng, Chaojun
de Meo, Pasquale
Wang, Qing
Zhu, Jia
IEEE ACCESS, 2024, 12 : 20830 - 20839
[46] Contrastive self-supervised representation learning without negative samples for multimodal human action recognition
Yang, Huaigang
Ren, Ziliang
Yuan, Huaqiang
Xu, Zhenyu
Zhou, Jun
FRONTIERS IN NEUROSCIENCE, 2023, 17
[47] Facial Emotion Recognition with Inter-Modality-Attention-Transformer-Based Self-Supervised Learning
Chaudhari, Aayushi
Bhatt, Chintan
Krishna, Achyut
Travieso-Gonzalez, Carlos M.
ELECTRONICS, 2023, 12 (02)
[48] Multimodal Transformer for Nursing Activity Recognition
Ijaz, Momal
Diaz, Renato
Chen, Chen
arXiv, 2022,
[49] ON THE CONVERGENCE OF A SELF-SUPERVISED VOWEL RECOGNITION SYSTEM
PATHAK, A
PAL, SK
PATTERN RECOGNITION, 1987, 20 (02) : 237 - 244
[50] Multimodal Transformer for Nursing Activity Recognition
Ijaz, Momal
Diaz, Renato
Chen, Chen
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 2064 - 2073

← 1 2 3 4 5 →