Efficient Real-Time Smart Keyword Spotting Using Spectrogram-Based Hybrid CNN-LSTM for Edge System

被引:0
|
作者
Syafalni, Infall [1 ,2 ,3 ]
Amadeus, Clarence [1 ]
Sutisna, Nana [1 ,3 ]
Adiono, Trio [1 ]
机构
[1] Bandung Inst Technol, Sch Elect Engn & Informat, Bandung 40132, Indonesia
[2] Bandung Inst Technol, Univ Ctr Excellence Microelect, Bandung 40132, Indonesia
[3] Interuniv Microelect Ctr IMEC, B-3001 Leuven, Belgium
关键词
Edge computing; hybrid CNN-LSTM; keyword spotting; real-time; embedded devices;
D O I
10.1109/ACCESS.2024.3380350
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Keyword Spotting (KWS) is the task of recognizing spoken command words from a database. With recent application human-machine interactions, KWS systems require real-time performance, where edge computing is a preferable option. To allow KWS systems to work on fast and real-time implementation, a low-complexity yet high-accurate AI model is mandatory. In this paper, we propose a comprehensive voice command recognition system design and its hardware implementation. The proposed AI model considered in this system is SpectroNet-based and an efficient hybrid CNN-LSTM architecture with low complexity. Jetson Xavier NX is an edge device because of its strong computational power as an embedded device. The implementation result shows the proposed method offers quite good in terms of accuracy, indicated by no accuracy drop between the model implemented in PC and Jetson Xavier. However, the inference time is quite high, which is 180 ms/step. To improve the speed of the system, the TensorRT library is used to further optimize the model. Optimization of the model is found effective, reducing 59.35% of the total operation performed in SpectroNet when FP32 precision is used, and 59.63% when FP16 precision is used. The model is also sped up by 45% if FP32 precision mode is used and 62% if FP16 precision mode is used. However, there is a slight accuracy drop of 2.68% if FP32 precision mode is used and 4.84% if FP16 precision mode is used. This slight drop in accuracy is considered negligible compared to the performance boost that TensorRT gives. The work is useful for intelligent control systems such as smart vehicles, smartphones, computers, and smart communications.
引用
下载
收藏
页码:43109 / 43125
页数:17
相关论文
共 50 条
  • [31] Predicting hourly heating load in a district heating system based on a hybrid CNN-LSTM model
    Song, Jiancai
    Zhang, Liyi
    Xue, Guixiang
    Ma, YunPeng
    Gao, Shan
    Jiang, QingLing
    ENERGY AND BUILDINGS, 2021, 243
  • [32] Hybrid CNN-LSTM and IoT-based coal mine hazards monitoring and prediction system
    Dey, Prasanjit
    Chaulya, S. K.
    Kumar, Sanjay
    PROCESS SAFETY AND ENVIRONMENTAL PROTECTION, 2021, 152 : 249 - 263
  • [33] Voice Conversion Based Augmentation and a Hybrid CNN-LSTM Model for Improving Speaker-Independent Keyword Recognition on Limited Datasets
    Wubet, Yeshanew Ale
    Lian, Kuang-Yow
    IEEE ACCESS, 2022, 10 : 89170 - 89180
  • [34] Deep learning based phishing website identification system using CNN-LSTM classifier
    Sapkal, Vinod
    Gupta, Praveen
    Khan, Aboo Bakar
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2023, 44 (03): : 315 - 330
  • [35] Accurate prediction of electricity consumption using a hybrid CNN-LSTM model based on multivariable data
    Chung, Jaewon
    Jang, Beakcheol
    PLOS ONE, 2022, 17 (11):
  • [36] Design of recommendation system for tourist spot using sentiment analysis based on CNN-LSTM
    An, Hyeon-woo
    Moon, Nammee
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2019, 13 (3) : 1653 - 1663
  • [37] Real-time transcription, keyword spotting, archival and retrieval for telugu TV news using ASR
    Pala, Mythilisharan
    Parayitam, Laxminarayana
    Appala, Venkataramana
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (02) : 433 - 439
  • [38] Design of recommendation system for tourist spot using sentiment analysis based on CNN-LSTM
    Hyeon-woo An
    Nammee Moon
    Journal of Ambient Intelligence and Humanized Computing, 2022, 13 : 1653 - 1663
  • [39] Research on real-time prediction of completion time based on AE-CNN-LSTM
    Yuan, Minghai
    Li, Zichen
    Zhang, Chenxi
    Zheng, Liang
    Mao, Kefu
    Pei, Fengque
    COMPUTERS & INDUSTRIAL ENGINEERING, 2023, 185
  • [40] Real-Time Multipath Mitigation in Multi-GNSS Short Baseline Positioning via CNN-LSTM Method
    Tao, Yuan
    Liu, Chao
    Chen, Tianyang
    Zhao, Xingwang
    Liu, Chunyang
    Hu, Haojie
    Zhou, Tengfei
    Xin, Haiqiang
    Mathematical Problems in Engineering, 2021, 2021