Attention-Based Bimodal Neural Network Speech Recognition System on FPGA

被引:0
|
作者
Chen, Aiwu [1 ]
机构
[1] College of Intelligent Manufacturing (CIM), Hunan University of Science and Engineering (HUSE), Yongzhou,425199, China
来源
Informatica (Slovenia) | 2025年 / 49卷 / 13期
关键词
Audiovisual - Gates (transistor) - Neural networks - Speech enhancement - Speech recognition;
D O I
10.31449/inf.v49i13.7154
中图分类号
学科分类号
摘要
To further improve the accuracy of speech recognition technology, a neural network speech recognition system based on a field programmable gate array is designed. Firstly, a neural network audiovisual bimodal speech recognition algorithm based on an attention mechanism is designed. Then, a speech recognition platform based on on-site programmable gate arrays is built. The results showed that the word error rate and the character error rate of this research algorithm were 3.17% and 1.56%, respectively, which were significantly lower than the traditional Lip-Reading Network algorithm's 26.24% and 12.56%. The algorithm converged quickly when the training rounds were less than 10 and tended to stabilize when it was 20. The proposed speech recognition platform used many DSP units in its design, with a utilization rate of 83.2%, the lowest power consumption of 2.21W, the highest energy efficiency ratio of 26.15, and the shortest processing time and faster running speed. In summary, the research algorithm can reasonably allocate learning weights, improve training speed, and has certain feasibility and effectiveness because of introducing attention mechanism. It has good application effects in speech recognition, which helps to improve the accuracy of language recognition algorithms and promote communication between humans and machines. © 2025 Slovene Society Informatika. All rights reserved.
引用
收藏
页码:1 / 12
相关论文
共 50 条
  • [21] EEG emotion recognition using attention-based convolutional transformer neural network
    Gong, Linlin
    Li, Mingyang
    Zhang, Tao
    Chen, Wanzhong
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 84
  • [22] Attention-based recurrent neural network for automatic behavior laying hen recognition
    Laleye, Frejus A. A.
    Mousse, Mikael A.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (22) : 62443 - 62458
  • [23] Attention-Based Deep Neural Network and Its Application to Scene Text Recognition
    He, Haizhen
    Li, Jiehan
    2019 IEEE 11TH INTERNATIONAL CONFERENCE ON COMMUNICATION SOFTWARE AND NETWORKS (ICCSN 2019), 2019, : 672 - 677
  • [24] 4D attention-based neural network for EEG emotion recognition
    Guowen Xiao
    Meng Shi
    Mengwen Ye
    Bowen Xu
    Zhendi Chen
    Quansheng Ren
    Cognitive Neurodynamics, 2022, 16 : 805 - 818
  • [25] Underwater acoustic target recognition using attention-based deep neural network
    Xiao, Xu
    Wang, Wenbo
    Ren, Qunyan
    Gerstoft, Peter
    Ma, Li
    JASA EXPRESS LETTERS, 2021, 1 (10):
  • [26] 4D attention-based neural network for EEG emotion recognition
    Xiao, Guowen
    Shi, Meng
    Ye, Mengwen
    Xu, Bowen
    Chen, Zhendi
    Ren, Quansheng
    COGNITIVE NEURODYNAMICS, 2022, 16 (04) : 805 - 818
  • [27] Attention-Based Multi-Filter Convolutional Neural Network for Inappropriate Speech Detection
    Lin, Shu-Yu
    Chen, Yi-Ling
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [28] Effective Exploitation of Posterior Information for Attention-Based Speech Recognition
    Tang, Jian
    Hou, Junfeng
    Song, Yan
    Dai, Li-Rong
    McLoughlin, Ian
    IEEE ACCESS, 2020, 8 (08): : 108988 - 108999
  • [29] Attention-based Contextual Language Model Adaptation for Speech Recognition
    Martinez, Richard Diehl
    Novotney, Scott
    Bulyko, Ivan
    Rastrow, Ariya
    Stolcke, Andreas
    Gandhe, Ankur
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 1994 - 2003
  • [30] An attention-based network for serial number recognition on banknotes
    Lin, Zhijie
    He, Zhaoshui
    Tan, Beihai
    Shen, Yijiang
    Wang, Peitao
    Liu, Taiheng
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2022, 106