CubeLearn: End-to-End Learning for Human Motion Recognition From Raw mmWave Radar Signals

被引:20
|
作者
Zhao, Peijun [1 ,2 ]
Lu, Chris Xiaoxuan [3 ]
Wang, Bing [1 ,4 ]
Trigoni, Niki [1 ]
Markham, Andrew [1 ]
机构
[1] Univ Oxford, Dept Comp Sci, Oxford OX1 3BW, England
[2] MIT, Dept Mech Engn, Cambridge, MA 02139 USA
[3] Univ Edinburgh, Dept Informat, Edinburgh EH8 9AB, Scotland
[4] Hong Kong Polytech Univ, Dept Aeronaut & Aviat Engn, Hong Kong, Peoples R China
关键词
Doppler radar; Millimeter wave communication; Discrete Fourier transforms; Radar applications; Chirp; Neural networks; Convolutional neural networks; End-to-end neural network; mmWave radar; motion recognition;
D O I
10.1109/JIOT.2023.3237494
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
mmWave FMCW radar has attracted a huge amount of research interest for human-centered applications in recent years, such as human gesture and activity recognition. Most existing pipelines are built upon conventional discrete Fourier transform (DFT) preprocessing and deep neural network classifier hybrid methods, with a majority of previous works focusing on designing the downstream classifier to improve overall accuracy. In this work, we take a step back and look at the preprocessing module. To avoid the drawbacks of conventional DFT preprocessing, we propose a complex-weighted learnable preprocessing module, named CubeLearn, to directly extract features from raw radar signal and build an end-to-end deep neural network for mmWave FMCW radar motion recognition applications. Extensive experiments show that our CubeLearn module consistently improves the classification accuracies of different pipelines, especially, benefiting those simpler models, which are more likely to be used on edge devices due to their computational efficiency. We provide ablation studies on initialization methods and structure of the proposed module, as well as an evaluation of the running time on PC and edge devices. This work also serves as a comparison of different approaches toward data cube slicing. Through our task-agnostic design, we propose a first step toward a generic end-to-end solution for radar recognition problems.
引用
收藏
页码:10236 / 10249
页数:14
相关论文
共 50 条
  • [31] SCaLa: Supervised Contrastive Learning for End-to-End Speech Recognition
    Fu, Li
    Li, Xiaoxiao
    Wang, Runyu
    Fan, Lu
    Zhang, Zhengchen
    Chen, Meng
    Wu, Youzheng
    He, Xiaodong
    INTERSPEECH 2022, 2022, : 1006 - 1010
  • [32] Online Continual Learning of End-to-End Speech Recognition Models
    Yang, Muqiao
    Lane, Ian
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 2668 - 2672
  • [33] Toward End-to-End Face Recognition Through Alignment Learning
    Zhong, Yuanyi
    Chen, Jiansheng
    Huang, Bo
    IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (08) : 1213 - 1217
  • [34] Combining Articulatory Features with End-to-End Learning in Speech Recognition
    Qu, Leyuan
    Weber, Cornelius
    Lakomkin, Egor
    Twiefel, Johannes
    Wermter, Stefan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT III, 2018, 11141 : 500 - 510
  • [35] End-to-End Mispronunciation Detection and Diagnosis From Raw Waveforms
    Yan, Bi-Cheng
    Chen, Berlin
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 61 - 65
  • [36] END-TO-END BINAURAL SOUND LOCALISATION FROM THE RAW WAVEFORM
    Vecchiotti, Paolo
    Ma, Ning
    Squartini, Stefano
    Brown, Guy J.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 451 - 455
  • [37] Machine Learning and End-to-End Deep Learning for Monitoring Driver Distractions From Physiological and Visual Signals
    Gjoreski, Martin
    Gams, Matja S.
    Lustrek, Mitja
    Genc, Pelin
    Garbas, Jens-U.
    Hassan, Teena
    IEEE ACCESS, 2020, 8 (08) : 70590 - 70603
  • [38] Lightweight End-to-End Speech Recognition from Raw Audio Data Using Sinc-Convolutions
    Kuerzinger, Ludwig
    Lindae, Nicolas
    Klewitz, Palle
    Rigoll, Gerhard
    INTERSPEECH 2020, 2020, : 1659 - 1663
  • [39] End-to-end speech recognition from raw speech: Multi time-frequency resolution CNN architecture for efficient representation learning
    Eledath, Dhanya
    Inbarajan, P.
    Biradar, Anurag
    Mahadeva, Sathwick
    Ramasubramanian, V
    29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 536 - 540
  • [40] An End-to-End Sequence Learning Approach for Text Extraction and Recognition from Scene Image
    Lalitha, G.
    Lavanya, B.
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2022, 22 (07): : 220 - 228