Self-supervised multimodal fusion transformer for passive activity recognition

被引:4
|
作者
Koupai, Armand K. [1 ]
Bocus, Mohammud J. [1 ]
Santos-Rodriguez, Raul [1 ]
Piechocki, Robert J. [1 ]
McConville, Ryan [1 ]
机构
[1] Univ Bristol, Sch Comp Sci Elect & Elect Engn & Engn Maths, Bristol, Avon, England
基金
英国工程与自然科学研究理事会;
关键词
deep learning; multi modal/sensor fusion; passive WiFi-based HAR; self-supervised learning; vision transformer (ViT); WI-FI; GESTURE; CSI;
D O I
10.1049/wss2.12044
中图分类号
TN [电子技术、通信技术];
学科分类号
0809 ;
摘要
The pervasiveness of Wi-Fi signals provides significant opportunities for human sensing and activity recognition in fields such as healthcare. The sensors most commonly used for passive Wi-Fi sensing are based on passive Wi-Fi radar (PWR) and channel state information (CSI) data, however current systems do not effectively exploit the information acquired through multiple sensors to recognise the different activities. In this study, new properties of the Transformer architecture for multimodal sensor fusion are explored. Different signal processing techniques are used to extract multiple image-based features from PWR and CSI data such as spectrograms, scalograms and Markov transition field (MTF). The Fusion Transformer, an attention-based model for multimodal and multisensor fusion is first proposed. Experimental results show that the Fusion Transformer approach can achieve competitive results compared to a ResNet architecture but with much fewer resources. To further improve the model, a simple and effective framework for multimodal and multi-sensor self-supervised learning (SSL) is proposed. The self-supervised Fusion Transformer outperforms the baselines, achieving a macro F1-score of 95.9%. Finally, this study shows how this approach significantly outperforms the others when trained with as little as 1% (2 min) of labelled training data to 20% (40 min) of labelled training data.
引用
收藏
页码:149 / 160
页数:12
相关论文
共 50 条
  • [1] Multimodal Image Fusion via Self-Supervised Transformer
    Zhang, Jing
    Liu, Yu
    Liu, Aiping
    Xie, Qingguo
    Ward, Rabab
    Wang, Z. Jane
    Chen, Xun
    IEEE SENSORS JOURNAL, 2023, 23 (09) : 9796 - 9807
  • [2] Self-supervised representation learning using multimodal Transformer for emotion recognition
    Goetz, Theresa
    Arora, Pulkit
    Erick, F. X.
    Holzer, Nina
    Sawant, Shrutika
    PROCEEDINGS OF THE 8TH INTERNATIONAL WORKSHOP ON SENSOR-BASED ACTIVITY RECOGNITION AND ARTIFICIAL INTELLIGENCE, IWOAR 2023, 2023,
  • [3] Transformer-Based Self-Supervised Multimodal Representation Learning for Wearable Emotion Recognition
    Wu, Yujin
    Daoudi, Mohamed
    Amad, Ali
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (01) : 157 - 172
  • [4] Multimodal Emotion Recognition With Transformer-Based Self Supervised Feature Fusion
    Siriwardhana, Shamane
    Kaluarachchi, Tharindu
    Billinghurst, Mark
    Nanayakkara, Suranga
    IEEE ACCESS, 2020, 8 (08): : 176274 - 176285
  • [5] STFNet: Self-Supervised Transformer for Infrared and Visible Image Fusion
    Liu, Qiao
    Pi, Jiatian
    Gao, Peng
    Yuan, Di
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (02): : 1513 - 1526
  • [6] Transformer-Based Self-Supervised Learning for Emotion Recognition
    Vazquez-Rodriguez, Juan
    Lefebvre, Gregoire
    Cumin, Julien
    Crowley, James L.
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2605 - 2612
  • [7] Self-supervised Video Transformer
    Ranasinghe, Kanchana
    Naseer, Muzammal
    Khan, Salman
    Khan, Fahad Shahbaz
    Ryoo, Michael S.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2864 - 2874
  • [8] Self-supervised representation learning for surgical activity recognition
    Paysan, Daniel
    Haug, Luis
    Bajka, Michael
    Oelhafen, Markus
    Buhmann, Joachim M.
    INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, 2021, 16 (11) : 2037 - 2044
  • [9] Self-supervised representation learning for surgical activity recognition
    Daniel Paysan
    Luis Haug
    Michael Bajka
    Markus Oelhafen
    Joachim M. Buhmann
    International Journal of Computer Assisted Radiology and Surgery, 2021, 16 : 2037 - 2044
  • [10] Self-Supervised MultiModal Versatile Networks
    Alayrac, Jean-Baptiste
    Recasens, Adria
    Schneider, Rosalia
    Arandjelovic, Relja
    Ramapuram, Jason
    De Fauw, Jeffrey
    Smaira, Lucas
    Dieleman, Sander
    Zisserman, Andrew
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33