Efficient Real-Time Smart Keyword Spotting Using Spectrogram-Based Hybrid CNN-LSTM for Edge System

被引:0
|
作者
Syafalni, Infall [1 ,2 ,3 ]
Amadeus, Clarence [1 ]
Sutisna, Nana [1 ,3 ]
Adiono, Trio [1 ]
机构
[1] Bandung Inst Technol, Sch Elect Engn & Informat, Bandung 40132, Indonesia
[2] Bandung Inst Technol, Univ Ctr Excellence Microelect, Bandung 40132, Indonesia
[3] Interuniv Microelect Ctr IMEC, B-3001 Leuven, Belgium
关键词
Edge computing; hybrid CNN-LSTM; keyword spotting; real-time; embedded devices;
D O I
10.1109/ACCESS.2024.3380350
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Keyword Spotting (KWS) is the task of recognizing spoken command words from a database. With recent application human-machine interactions, KWS systems require real-time performance, where edge computing is a preferable option. To allow KWS systems to work on fast and real-time implementation, a low-complexity yet high-accurate AI model is mandatory. In this paper, we propose a comprehensive voice command recognition system design and its hardware implementation. The proposed AI model considered in this system is SpectroNet-based and an efficient hybrid CNN-LSTM architecture with low complexity. Jetson Xavier NX is an edge device because of its strong computational power as an embedded device. The implementation result shows the proposed method offers quite good in terms of accuracy, indicated by no accuracy drop between the model implemented in PC and Jetson Xavier. However, the inference time is quite high, which is 180 ms/step. To improve the speed of the system, the TensorRT library is used to further optimize the model. Optimization of the model is found effective, reducing 59.35% of the total operation performed in SpectroNet when FP32 precision is used, and 59.63% when FP16 precision is used. The model is also sped up by 45% if FP32 precision mode is used and 62% if FP16 precision mode is used. However, there is a slight accuracy drop of 2.68% if FP32 precision mode is used and 4.84% if FP16 precision mode is used. This slight drop in accuracy is considered negligible compared to the performance boost that TensorRT gives. The work is useful for intelligent control systems such as smart vehicles, smartphones, computers, and smart communications.
引用
下载
收藏
页码:43109 / 43125
页数:17
相关论文
共 50 条
  • [1] CNN-LSTM Based Smart Real-time Video Surveillance System
    Iqrar, Waqas
    Abidien, Malik ZainUl
    Hameed, Waqas
    Shahzad, Aamir
    2022 14TH INTERNATIONAL CONFERENCE ON MATHEMATICS, ACTUARIAL SCIENCE, COMPUTER SCIENCE AND STATISTICS (MACS), 2022,
  • [2] RTKWS: Real-Time Keyword Spotting Based on Integer Arithmetic for Edge Deployment
    Dhungana, Prakash
    Salehi, Sayed Ahmad
    2024 25TH INTERNATIONAL SYMPOSIUM ON QUALITY ELECTRONIC DESIGN, ISQED 2024, 2024,
  • [3] Real time detection of driver fatigue based on CNN-LSTM
    Liu, Ming-Zhou
    Xu, Xin
    Hu, Jing
    Jiang, Qian-Nan
    IET IMAGE PROCESSING, 2022, 16 (02) : 576 - 595
  • [4] Real-time rubber quality model based on CNN-LSTM deep learning theory
    Han, Shanling
    Dong, Wenzheng
    Sun, He
    Xiao, Peng
    Zhang, Shoudong
    Chen, Long
    Li, Yong
    MATERIALS TODAY COMMUNICATIONS, 2023, 35
  • [5] Real-Time Presentation Tracking Using Semantic Keyword Spotting
    Asadi, Reza
    Fell, Harriet J.
    Bickmore, Timothy
    Trinh, Ha
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3081 - 3085
  • [6] A hybrid approach to detecting Parkinson's disease using spectrogram and deep learning CNN-LSTM network
    Shibina V.
    Thasleema T.M.
    International Journal of Speech Technology, 2024, 27 (03) : 657 - 671
  • [7] Real-time Human Activity Classification From Radar With CNN-LSTM Network
    Yang, Zhengtao
    Wang, Haili
    Ni, Peiyuan
    Wang, Pengfei
    Cao, Qixin
    Fang, Lei
    PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 50 - 55
  • [8] Hybrid CNN-LSTM Network for Real-Time Apnea-Hypopnea Event Detection Based on IR-UWB Radar
    Kwon, Hyun Bin
    Son, Dongyeon
    Lee, Dongseok
    Yoon, Heenam
    Lee, Mi Hyun
    Lee, Yu Jin
    Choi, Sang Ho
    Park, Kwang Suk
    IEEE ACCESS, 2022, 10 : 17556 - 17564
  • [9] Comparative study of CNN, LSTM and hybrid CNN-LSTM model in amazigh speech recognition using spectrogram feature extraction and different gender and age dataset
    Telmem, Meryam
    Laaidi, Naouar
    Ghanou, Youssef
    Hamiane, Sanae
    Satori, Hassan
    International Journal of Speech Technology, 2024, 27 (04) : 1121 - 1133
  • [10] A fused CNN-LSTM model using FFT with application to real-time power quality disturbances recognition
    Cen, Senfeng
    Kim, Dong Ok
    Lim, Chang Gyoon
    ENERGY SCIENCE & ENGINEERING, 2023, 11 (07) : 2267 - 2280