WiVi: WiFi-Video Cross-Modal Fusion based Multi-Path Gait Recognition System

被引：3

作者：

Fan, Jinmeng ^{[1
,2
,3
]}

Zhou, Hao ^{[1
,2
,3
]}

Zhou, Fengyu ^{[1
,2
,3
]}

Wang, Xiaoyan ^{[4
]}

Liu, Zhi ^{[5
]}

Li, Xiang-Yang ^{[1
,2
,3
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, LINKE Lab, Hefei, Peoples R China

[2] Univ Sci & Technol China, CAS Key Lab Wireless Opt Commun, Hefei, Peoples R China

[3] Deqing Alpha Innovat Inst, Huzhou, Zhejiang, Peoples R China

[4] Ibaraki Univ, Grad Sch Sci & Engn, Ibaraki, Japan

[5] Univ Electrocommun, Dept Comp & Network Engn, Tokyo, Japan

来源：

2022 IEEE/ACM 30TH INTERNATIONAL SYMPOSIUM ON QUALITY OF SERVICE (IWQOS) | 2022年

基金：

国家重点研发计划;

关键词：

D O I：

10.1109/IWQoS54832.2022.9812893

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

WiFi-based gait recognition is an attractive method for device-free user identification, but path-sensitive Channel State Information (CSI) hinders its application in multi-path environments, which exacerbates sampling and deployment costs (i.e., large number of samples and multiple specially placed devices). On the other hand, although video-based ideal CSI generation is promising for dramatically reducing samples, the missing environment-related information in the ideal CSI makes it unsuitable for general indoor scenarios with multiple walking paths. In this paper, we propose WiVi, a WiFi-video cross-modal fusion based multi-path gait recognition system which needs fewer samples and fewer devices simultaneously. When the subject walks naturally in the room, we determine whether he/she is walking on the predefined judgment paths with a K-Nearest Neighbors (KNN) classifier working on the WiFi-based human localization results. For each judgment path, we generate the ideal CSI through video-based simulation to decrease the number of needed samples, and adopt two separated neural networks (NNs) to fulfill environment-aware comparison among the ideal and measured CSIs. The first network is supervised by measured CSI samples, and learns to obtain the semi-ideal CSI features which contain the room-specific 'accent', i.e., the long-term environment influence normally caused by room layout. The second network is trained for similarity evaluation between the semi-ideal and measured features, with the existence of short-term environment influence such as channel variation or noises. We implement the prototype system and conduct extensive experiments to evaluate the performance. Experimental results show that WiVi's recognition accuracy ranges from 85.4% for a 6-person group to 98.0% for a 3-person group. As compared with single-path gait recognition systems, we achieve average 113.8% performance improvement. As compared with the other multi-path gait recognition systems, we achieve similar or even better performance with needed samples being reduced by 57.1-93.7%

引用

下载

页数：10

共 50 条

[41] Recognition of Micro-Motion Space Targets Based on Attention-Augmented Cross-Modal Feature Fusion Recognition Network
Tian, Xudong
Bai, Xueru
Zhou, Feng
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
[42] ITF-WPI: Image and text based cross-modal feature fusion model for wolfberry pest recognition
Dai, Guowei
Fan, Jingchao
Dewi, Christine
COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2023, 212
[43] Cross-modal photo-caricature face recognition based on dynamic multi-task learning
Ming, Zuheng
Burie, Jean-Christophe
Luqman, Muhammad Muzzamil
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2021, 24 (1-2) : 33 - 48
[44] Cross-modal photo-caricature face recognition based on dynamic multi-task learning
Zuheng Ming
Jean-Christophe Burie
Muhammad Muzzamil Luqman
International Journal on Document Analysis and Recognition (IJDAR), 2021, 24 : 33 - 48
[45] UCEMA: Uni-modal and cross-modal encoding network based on multi-head attention for emotion recognition in conversation
Zhao, Hongkun
Liu, Siyuan
Chen, Yang
Kong, Fanmin
Zeng, Qingtian
Li, Kang
Multimedia Systems, 2024, 30 (06)
[46] Progressive learning in cross-modal cross-scale fusion transformer for visible-infrared video-based person reidentification
Mukhtar, Hamza
Mukhtar, Umar Raza
Knowledge-Based Systems, 2024, 304
[47] Cross-Modal Learning Based on Semantic Correlation and Multi-Task Learning for Text-Video Retrieval
Wu, Xiaoyu
Wang, Tiantian
Wang, Shengjin
ELECTRONICS, 2020, 9 (12) : 1 - 17
[48] Attention-based cross-modal fusion for audio-visual voice activity detection in musical video streams
Hou, Yuanbo
Yu, Zhesong
Liang, Xia
Du, Xingjian
Zhu, Bilei
Ma, Zejun
Botteldooren, Dick
INTERSPEECH 2021, 2021, : 321 - 325
[49] Holistic-Based Cross-Attention Modal Fusion Network for Video Sign Language Recognition
Gao, Qing
Hu, Jing
Mai, Haixing
Ju, Zhaojie
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024,
[50] The Design of Multi-path Video Network Monitor System Based on TMS320DM642
Zhao, Jie
Li, Hailiang
Mu, Hongmei
PROCEEDINGS OF THE 2012 24TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2012, : 1964 - 1967

← 1 2 3 4 5 →