Graphic Representation Method and Neural Network Recognition of Time-Frequency Vectors of Speech Information

被引:0
|
作者
A. O. Zhirkov
D. N. Kortchagine
A. S. Lukin
A. S. Krylov
Yu. M. Bayakovskii
机构
[1] Moscow State University,Department of Computational Mathematics and Cybernetics
[2] Vorob'evy gory,undefined
来源
关键词
Neural Network; Operating System; Graphic Representation; Wavelet Transformation; Recognition Method;
D O I
暂无
中图分类号
学科分类号
摘要
Currently, various time-frequency representations are often used for sound analysis. These representations, on the one hand, are convenient for visible sensation of sound by a human and, on the other hand, can be used for automatically analyzing sound pictures. In this paper, various methods for representation of sound as two-dimensional time-frequency vectors of a fixed dimension and their use for speech and speaker recognition problems are discussed. Probabilistic, distance-based, and neural-network methods for the recognition of these vectors by examples of separate words are considered. Numerical experiments showed that the best among them is the method based on a three-layer neural network, the short-time Fourier transform, and the two-dimensional wavelet transformation. For the speaker recognition problem, a distance-based recognition method employing the adaptive Hermite transform turned out the best among all.
引用
收藏
页码:210 / 218
页数:8
相关论文
共 50 条
  • [1] Graphic representation method and neural network recognition of time-frequency vectors of speech information
    Zhirkov, AO
    Kortchagine, DN
    Lukin, AS
    Krylov, AS
    Bayakovskii, YM
    PROGRAMMING AND COMPUTER SOFTWARE, 2003, 29 (04) : 210 - 218
  • [2] Speech Emotion Recognition via an Attentive Time-Frequency Neural Network
    Lu, Cheng
    Zheng, Wenming
    Lian, Hailun
    Zong, Yuan
    Tang, Chuangao
    Li, Sunan
    Zhao, Yan
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2023, 10 (06) : 3159 - 3168
  • [3] Time-Frequency Representation and Convolutional Neural Network-Based Emotion Recognition
    Khare, Smith K.
    Bajaj, Varun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (07) : 2901 - 2909
  • [4] Time-frequency representation based cepstral processing for speech recognition
    Fineberg, AB
    Yu, KC
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 25 - 28
  • [5] A time-frequency smoothing neural network for speech enhancement
    Yuan, Wenhao
    SPEECH COMMUNICATION, 2020, 124 : 75 - 84
  • [6] Cepstral representation of speech motivated by time-frequency masking: An application to speech recognition
    Aikawa, K
    Singer, H
    Kawahara, H
    Tohkura, Y
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (01): : 603 - 614
  • [7] Weighting Time-Frequency Representation of Speech using Auditory Saliency for Automatic Speech Recognition
    Cong-Thanh Do
    Stylianou, Yannis
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1591 - 1595
  • [8] Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
    Arash Dehghani
    Seyyed Ali Seyyedsalehi
    Neural Processing Letters, 2023, 55 : 3205 - 3224
  • [9] Time-Frequency Localization Using Deep Convolutional Maxout Neural Network in Persian Speech Recognition
    Dehghani, Arash
    Seyyedsalehi, Seyyed Ali
    NEURAL PROCESSING LETTERS, 2023, 55 (03) : 3205 - 3224
  • [10] Time-Frequency Representation Learning with Graph Convolutional Network for Dialogue-level Speech Emotion Recognition
    Liu, Jiaxing
    Song, Yaodong
    Wang, Longbiao
    Dang, Jianwu
    Yu, Ruiguo
    INTERSPEECH 2021, 2021, : 4523 - 4527