Efficient Real-Time Smart Keyword Spotting Using Spectrogram-Based Hybrid CNN-LSTM for Edge System

被引：0

作者：

Syafalni, Infall ^{[1
,2
,3
]}

Amadeus, Clarence ^{[1
]}

Sutisna, Nana ^{[1
,3
]}

Adiono, Trio ^{[1
]}

机构：

[1] Bandung Inst Technol, Sch Elect Engn & Informat, Bandung 40132, Indonesia

[2] Bandung Inst Technol, Univ Ctr Excellence Microelect, Bandung 40132, Indonesia

[3] Interuniv Microelect Ctr IMEC, B-3001 Leuven, Belgium

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Edge computing; hybrid CNN-LSTM; keyword spotting; real-time; embedded devices;

D O I：

10.1109/ACCESS.2024.3380350

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Keyword Spotting (KWS) is the task of recognizing spoken command words from a database. With recent application human-machine interactions, KWS systems require real-time performance, where edge computing is a preferable option. To allow KWS systems to work on fast and real-time implementation, a low-complexity yet high-accurate AI model is mandatory. In this paper, we propose a comprehensive voice command recognition system design and its hardware implementation. The proposed AI model considered in this system is SpectroNet-based and an efficient hybrid CNN-LSTM architecture with low complexity. Jetson Xavier NX is an edge device because of its strong computational power as an embedded device. The implementation result shows the proposed method offers quite good in terms of accuracy, indicated by no accuracy drop between the model implemented in PC and Jetson Xavier. However, the inference time is quite high, which is 180 ms/step. To improve the speed of the system, the TensorRT library is used to further optimize the model. Optimization of the model is found effective, reducing 59.35% of the total operation performed in SpectroNet when FP32 precision is used, and 59.63% when FP16 precision is used. The model is also sped up by 45% if FP32 precision mode is used and 62% if FP16 precision mode is used. However, there is a slight accuracy drop of 2.68% if FP32 precision mode is used and 4.84% if FP16 precision mode is used. This slight drop in accuracy is considered negligible compared to the performance boost that TensorRT gives. The work is useful for intelligent control systems such as smart vehicles, smartphones, computers, and smart communications.

引用

下载

页码：43109 / 43125

页数：17

共 50 条

[41] Real-Time Multipath Mitigation in Multi-GNSS Short Baseline Positioning via CNN-LSTM Method
Tao, Yuan
Liu, Chao
Chen, Tianyang
Zhao, Xingwang
Liu, Chunyang
Hu, Haojie
Zhou, Tengfei
Xin, Haiqiang
MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
[42] Real-time transcription, keyword spotting, archival and retrieval for telugu TV news using ASR
Mythilisharan Pala
Laxminarayana Parayitam
Venkataramana Appala
International Journal of Speech Technology, 2019, 22 : 433 - 439
[43] Real-time crash risk prediction on arterials based on LSTM-CNN
Li, Pei
Abdel-Aty, Mohamed
Yuan, Jinghui
ACCIDENT ANALYSIS AND PREVENTION, 2020, 135 (135):
[44] Kinect Based Real-time Gesture Spotting Using HCRF
Chikkanna, Mahesh
Guddeti, Ram Mohana Reddy
2013 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2013, : 925 - 928
[45] DL-IDS: Extracting Features Using CNN-LSTM Hybrid Network for Intrusion Detection System
Sun, Pengfei
Liu, Pengju
Li, Qi
Liu, Chenxi
Lu, Xiangling
Hao, Ruochen
Chen, Jinpeng
SECURITY AND COMMUNICATION NETWORKS, 2020, 2020
[46] A deep learning-based novel hybrid CNN-LSTM architecture for efficient detection of threats in the IoT ecosystem
Nazir, Ahsan
He, Jingsha
Zhu, Nafei
Qureshi, Saima Siraj
Qureshi, Siraj Uddin
Ullah, Faheem
Wajahat, Ahsan
Pathan, Muhammad Salman
AIN SHAMS ENGINEERING JOURNAL, 2024, 15 (07)
[47] Energy-efficient edge based real-time healthcare support system
Abirami, S.
Chitra, P.
DIGITAL TWIN PARADIGM FOR SMARTER SYSTEMS AND ENVIRONMENTS: THE INDUSTRY USE CASES, 2020, 117 : 339 - 368
[48] A short-term load forecasting model of multi-scale CNN-LSTM hybrid neural network considering the real-time electricity price
Guo, Xifeng
Zhao, Qiannan
Zheng, Di
Ning, Yi
Gao, Ye
ENERGY REPORTS, 2020, 6 : 1046 - 1053
[49] Exploiting Multiple Receivers for CSI-Based Activity Classification Using A Hybrid CNN-LSTM Model
PROCEEDINGS OF THE 1ST ACMWORKSHOP ON DEVICE-FREE HUMAN SENSING (DFHS 19), 2019, : 18 - 21
[50] Efficient Real-time On-the-edge Facial Expression Recognition using Optomyography Smart Glasses
Sofronievski, Bojan
Kiprijanovska, Ivana
Stankoski, Simon
Sazdov, Borjan
Kjosev, Josif
Nduka, Charles
Gjoreski, Hristijan
2024 INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS, IE 2024, 2024, : 49 - 55

← 1 2 3 4 5 →