Efficient Real-Time Smart Keyword Spotting Using Spectrogram-Based Hybrid CNN-LSTM for Edge System

被引:0
|
作者
Syafalni, Infall [1 ,2 ,3 ]
Amadeus, Clarence [1 ]
Sutisna, Nana [1 ,3 ]
Adiono, Trio [1 ]
机构
[1] Bandung Inst Technol, Sch Elect Engn & Informat, Bandung 40132, Indonesia
[2] Bandung Inst Technol, Univ Ctr Excellence Microelect, Bandung 40132, Indonesia
[3] Interuniv Microelect Ctr IMEC, B-3001 Leuven, Belgium
关键词
Edge computing; hybrid CNN-LSTM; keyword spotting; real-time; embedded devices;
D O I
10.1109/ACCESS.2024.3380350
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Keyword Spotting (KWS) is the task of recognizing spoken command words from a database. With recent application human-machine interactions, KWS systems require real-time performance, where edge computing is a preferable option. To allow KWS systems to work on fast and real-time implementation, a low-complexity yet high-accurate AI model is mandatory. In this paper, we propose a comprehensive voice command recognition system design and its hardware implementation. The proposed AI model considered in this system is SpectroNet-based and an efficient hybrid CNN-LSTM architecture with low complexity. Jetson Xavier NX is an edge device because of its strong computational power as an embedded device. The implementation result shows the proposed method offers quite good in terms of accuracy, indicated by no accuracy drop between the model implemented in PC and Jetson Xavier. However, the inference time is quite high, which is 180 ms/step. To improve the speed of the system, the TensorRT library is used to further optimize the model. Optimization of the model is found effective, reducing 59.35% of the total operation performed in SpectroNet when FP32 precision is used, and 59.63% when FP16 precision is used. The model is also sped up by 45% if FP32 precision mode is used and 62% if FP16 precision mode is used. However, there is a slight accuracy drop of 2.68% if FP32 precision mode is used and 4.84% if FP16 precision mode is used. This slight drop in accuracy is considered negligible compared to the performance boost that TensorRT gives. The work is useful for intelligent control systems such as smart vehicles, smartphones, computers, and smart communications.
引用
下载
收藏
页码:43109 / 43125
页数:17
相关论文
共 50 条
  • [41] Real-Time Multipath Mitigation in Multi-GNSS Short Baseline Positioning via CNN-LSTM Method
    Tao, Yuan
    Liu, Chao
    Chen, Tianyang
    Zhao, Xingwang
    Liu, Chunyang
    Hu, Haojie
    Zhou, Tengfei
    Xin, Haiqiang
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [42] Real-time transcription, keyword spotting, archival and retrieval for telugu TV news using ASR
    Mythilisharan Pala
    Laxminarayana Parayitam
    Venkataramana Appala
    International Journal of Speech Technology, 2019, 22 : 433 - 439
  • [43] Real-time crash risk prediction on arterials based on LSTM-CNN
    Li, Pei
    Abdel-Aty, Mohamed
    Yuan, Jinghui
    ACCIDENT ANALYSIS AND PREVENTION, 2020, 135 (135):
  • [44] Kinect Based Real-time Gesture Spotting Using HCRF
    Chikkanna, Mahesh
    Guddeti, Ram Mohana Reddy
    2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 925 - 928
  • [45] DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network for Intrusion Detection System
    Sun, Pengfei
    Liu, Pengju
    Li, Qi
    Liu, Chenxi
    Lu, Xiangling
    Hao, Ruochen
    Chen, Jinpeng
    SECURITY AND COMMUNICATION NETWORKS, 2020, 2020
  • [46] A deep learning-based novel hybrid CNN-LSTM architecture for efficient detection of threats in the IoT ecosystem
    Nazir, Ahsan
    He, Jingsha
    Zhu, Nafei
    Qureshi, Saima Siraj
    Qureshi, Siraj Uddin
    Ullah, Faheem
    Wajahat, Ahsan
    Pathan, Muhammad Salman
    AIN SHAMS ENGINEERING JOURNAL, 2024, 15 (07)
  • [47] Energy-efficient edge based real-time healthcare support system
    Abirami, S.
    Chitra, P.
    DIGITAL TWIN PARADIGM FOR SMARTER SYSTEMS AND ENVIRONMENTS: THE INDUSTRY USE CASES, 2020, 117 : 339 - 368
  • [48] A short-term load forecasting model of multi-scale CNN-LSTM hybrid neural network considering the real-time electricity price
    Guo, Xifeng
    Zhao, Qiannan
    Zheng, Di
    Ning, Yi
    Gao, Ye
    ENERGY REPORTS, 2020, 6 : 1046 - 1053
  • [49] Exploiting Multiple Receivers for CSI-Based Activity Classification Using A Hybrid CNN-LSTM Model
    PROCEEDINGS OF THE 1ST ACMWORKSHOP ON DEVICE-FREE HUMAN SENSING (DFHS 19), 2019, : 18 - 21
  • [50] Efficient Real-time On-the-edge Facial Expression Recognition using Optomyography Smart Glasses
    Sofronievski, Bojan
    Kiprijanovska, Ivana
    Stankoski, Simon
    Sazdov, Borjan
    Kjosev, Josif
    Nduka, Charles
    Gjoreski, Hristijan
    2024 INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS, IE 2024, 2024, : 49 - 55