Audio Recognition Using Deep Learning for Edge Devices

被引:0
|
作者
Kulkarni, Aditya [1 ]
Jabade, Vaishali [1 ]
Patil, Aniket [1 ,2 ]
机构
[1] Vishwakarma Inst Technol, Elect & Telecommun, Pune, India
[2] Ifm Engn Private Ltd, Artificial Intelligence Team, Pune, India
关键词
Convolutional Neural Networks; Deep learning; Short Time Fourier Transform; Spectrogram; SPEECH RECOGNITION;
D O I
10.1007/978-3-031-12641-3_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper has proposed a methodology that creates an automatic speech recognition system, the task for which would be to recognize keywords. Deep learning was deployed for classifying the spoken words. The created audio data set consisted of short audio clips which were then converted to a Spectrogram by computing Short Time Fourier Transform (STFT) of each audio sample from the data. Spectrogram is a picture of spectrum of frequencies of a signal. Convolutional neural network is a deep learning algorithm, prominently used for classifying image data. In our case Spectrogram, which is the audio representation, was used to train the CNN model and the model which achieved a higher recognition rate was deployed on the hardware. The proposed research has been motivated from the requirement of an audio classification system that can be deployed on hardware and further based on the classification the hardware has been assigned certain task to be completed.
引用
收藏
页码:186 / 198
页数:13
相关论文
共 50 条
  • [1] Deep Learning for Activity Recognition Using Audio and Video
    Reinolds, Francisco
    Neto, Cristiana
    Machado, Jose
    [J]. ELECTRONICS, 2022, 11 (05)
  • [2] Crowd Counting Using Deep Learning in Edge Devices
    Huang, Zuo
    Sinnott, Richard O.
    Ke, Qiuhong
    [J]. 8TH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, BDCAT 2021, 2021, : 28 - 37
  • [3] A Lightweight Deep Learning Model for Human Activity Recognition on Edge Devices
    Agarwal, Preeti
    Alam, Mansaf
    [J]. INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND DATA SCIENCE, 2020, 167 : 2364 - 2373
  • [4] Deep learning for edge devices
    Zaniolo, Luiz
    Garbin, Christian
    Marques, Oge
    [J]. IEEE Potentials, 2023, 42 (04): : 39 - 45
  • [5] Speech Emotion Recognition Using Deep Learning on audio recordings
    Suganya, S.
    Charles, E. Y. A.
    [J]. 2019 19TH INTERNATIONAL CONFERENCE ON ADVANCES IN ICT FOR EMERGING REGIONS (ICTER - 2019), 2019,
  • [6] Audio-visual speech recognition using deep learning
    Noda, Kuniaki
    Yamaguchi, Yuki
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    Ogata, Tetsuya
    [J]. APPLIED INTELLIGENCE, 2015, 42 (04) : 722 - 737
  • [7] Audio-visual speech recognition using deep learning
    Kuniaki Noda
    Yuki Yamaguchi
    Kazuhiro Nakadai
    Hiroshi G. Okuno
    Tetsuya Ogata
    [J]. Applied Intelligence, 2015, 42 : 722 - 737
  • [8] Emotion recognition of audio/speech data using deep learning approaches
    Gupta, Vedika
    Juyal, Stuti
    Singh, Gurvinder Pal
    Killa, Chirag
    Gupta, Nishant
    [J]. JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (06): : 1309 - 1317
  • [9] Fish Recognition in Underwater Environments using Deep Learning and Audio Data
    Laplante, Jean-Francois
    Akhloufi, Moulay A.
    Gervaise, Cedric
    [J]. OCEAN SENSING AND MONITORING XIII, 2021, 11752
  • [10] Digit Recognition Applied to Reconstructed Audio Signals Using Deep Learning
    Toufa, Anastasia-Sotiria
    Kotropoulos, Constantine
    [J]. 2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 3050 - 3057