WiVi: WiFi-Video Cross-Modal Fusion based Multi-Path Gait Recognition System

被引:3
|
作者
Fan, Jinmeng [1 ,2 ,3 ]
Zhou, Hao [1 ,2 ,3 ]
Zhou, Fengyu [1 ,2 ,3 ]
Wang, Xiaoyan [4 ]
Liu, Zhi [5 ]
Li, Xiang-Yang [1 ,2 ,3 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, LINKE Lab, Hefei, Peoples R China
[2] Univ Sci & Technol China, CAS Key Lab Wireless Opt Commun, Hefei, Peoples R China
[3] Deqing Alpha Innovat Inst, Huzhou, Zhejiang, Peoples R China
[4] Ibaraki Univ, Grad Sch Sci & Engn, Ibaraki, Japan
[5] Univ Electrocommun, Dept Comp & Network Engn, Tokyo, Japan
基金
国家重点研发计划;
关键词
D O I
10.1109/IWQoS54832.2022.9812893
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
WiFi-based gait recognition is an attractive method for device-free user identification, but path-sensitive Channel State Information (CSI) hinders its application in multi-path environments, which exacerbates sampling and deployment costs (i.e., large number of samples and multiple specially placed devices). On the other hand, although video-based ideal CSI generation is promising for dramatically reducing samples, the missing environment-related information in the ideal CSI makes it unsuitable for general indoor scenarios with multiple walking paths. In this paper, we propose WiVi, a WiFi-video cross-modal fusion based multi-path gait recognition system which needs fewer samples and fewer devices simultaneously. When the subject walks naturally in the room, we determine whether he/she is walking on the predefined judgment paths with a K-Nearest Neighbors (KNN) classifier working on the WiFi-based human localization results. For each judgment path, we generate the ideal CSI through video-based simulation to decrease the number of needed samples, and adopt two separated neural networks (NNs) to fulfill environment-aware comparison among the ideal and measured CSIs. The first network is supervised by measured CSI samples, and learns to obtain the semi-ideal CSI features which contain the room-specific 'accent', i.e., the long-term environment influence normally caused by room layout. The second network is trained for similarity evaluation between the semi-ideal and measured features, with the existence of short-term environment influence such as channel variation or noises. We implement the prototype system and conduct extensive experiments to evaluate the performance. Experimental results show that WiVi's recognition accuracy ranges from 85.4% for a 6-person group to 98.0% for a 3-person group. As compared with single-path gait recognition systems, we achieve average 113.8% performance improvement. As compared with the other multi-path gait recognition systems, we achieve similar or even better performance with needed samples being reduced by 57.1-93.7%
引用
下载
收藏
页数:10
相关论文
共 50 条
  • [41] Recognition of Micro-Motion Space Targets Based on Attention-Augmented Cross-Modal Feature Fusion Recognition Network
    Tian, Xudong
    Bai, Xueru
    Zhou, Feng
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [42] ITF-WPI: Image and text based cross-modal feature fusion model for wolfberry pest recognition
    Dai, Guowei
    Fan, Jingchao
    Dewi, Christine
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 212
  • [43] Cross-modal photo-caricature face recognition based on dynamic multi-task learning
    Ming, Zuheng
    Burie, Jean-Christophe
    Luqman, Muhammad Muzzamil
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2021, 24 (1-2) : 33 - 48
  • [44] Cross-modal photo-caricature face recognition based on dynamic multi-task learning
    Zuheng Ming
    Jean-Christophe Burie
    Muhammad Muzzamil Luqman
    International Journal on Document Analysis and Recognition (IJDAR), 2021, 24 : 33 - 48
  • [45] UCEMA: Uni-modal and cross-modal encoding network based on multi-head attention for emotion recognition in conversation
    Zhao, Hongkun
    Liu, Siyuan
    Chen, Yang
    Kong, Fanmin
    Zeng, Qingtian
    Li, Kang
    Multimedia Systems, 2024, 30 (06)
  • [46] Progressive learning in cross-modal cross-scale fusion transformer for visible-infrared video-based person reidentification
    Mukhtar, Hamza
    Mukhtar, Umar Raza
    Knowledge-Based Systems, 2024, 304
  • [47] Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval
    Wu, Xiaoyu
    Wang, Tiantian
    Wang, Shengjin
    ELECTRONICS, 2020, 9 (12) : 1 - 17
  • [48] Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams
    Hou, Yuanbo
    Yu, Zhesong
    Liang, Xia
    Du, Xingjian
    Zhu, Bilei
    Ma, Zejun
    Botteldooren, Dick
    INTERSPEECH 2021, 2021, : 321 - 325
  • [49] Holistic-Based Cross-Attention Modal Fusion Network for Video Sign Language Recognition
    Gao, Qing
    Hu, Jing
    Mai, Haixing
    Ju, Zhaojie
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
  • [50] The Design of Multi-path Video Network Monitor System Based on TMS320DM642
    Zhao, Jie
    Li, Hailiang
    Mu, Hongmei
    PROCEEDINGS OF THE 2012 24TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2012, : 1964 - 1967