Phoneme recognition using speech image (spectrogram)

被引:0
|
作者
Ahmadi, M
Bailey, NJ
Hoyle, BS
机构
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In this paper a novel feature extraction technique based on the two-dimensional DCT (Discrete Cosine Transform) and zigzag scanning of the spectrogram is proposed. This is in contrast to conventional approaches based on single dimension analysis such as LPC. Cepstral, or FFT. As a phoneme recognition task, a series of experiments were conducted on the voice stops ('b', 'd', 'g') of the TIMIT database uttered by 630 speakers (male & female). The extracted data form the basis for input patterns for training two types of neural networks, the semi-dynamic network (TDNN), and a static network (MLP). The highest recognition rates of 77.5 and 72.4 percent were recorded for TDNN and MLP respectively. This contrasts with results of 72 percent quoted by Hwang et al [2] for the same phonemes spoken by 40 females.
引用
收藏
页码:675 / 677
页数:3
相关论文
共 50 条
  • [1] Speech Emotion Recognition Using Spectrogram & Phoneme Embedding
    Yenigalla, Promod
    Kumar, Abhay
    Tripathi, Suraj
    Singh, Chirag
    Kar, Sibsambhu
    Vepa, Jithendra
    [J]. 19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 3688 - 3692
  • [2] Speech and Phoneme Segmentation Under Noisy Environment Through Spectrogram Image Analysis
    Costa, D. C.
    Lopes, G. A. M.
    Mello, C. A. B.
    Viana, H. O.
    [J]. PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 1017 - 1022
  • [3] Robust speech recognition using the modulation spectrogram
    Kingsbury, BED
    Morgan, N
    Greenberg, S
    [J]. SPEECH COMMUNICATION, 1998, 25 (1-3) : 117 - 132
  • [4] A facial expression recognition for a speaker of a phoneme of vowel using thermal image processing and a speech recognition system
    Koda, Y.
    Yoshitomi, Y.
    Nakano, M.
    Tabuse, M.
    [J]. RO-MAN 2009: THE 18TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, VOLS 1 AND 2, 2009, : 192 - +
  • [5] Emotion recognition based on AlexNet using speech spectrogram
    Park, Soeun
    Lee, Chul
    Kwon, Soonil
    Park, Neungsoo
    [J]. BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2018, 123 : 49 - 49
  • [6] Detecting Human Emotion via Speech Recognition by Using Speech Spectrogram
    Prasomphan, Sathit
    [J]. PROCEEDINGS OF THE 2015 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (IEEE DSAA 2015), 2015, : 113 - 122
  • [7] PHONEME GROUPING FOR SPEECH RECOGNITION
    REDDY, DR
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 41 (05): : 1295 - &
  • [8] Improvement Of Speech Emotion Recognition with Neural Network Classifier by Using Speech Spectrogram
    Prasomphan, Sathit
    [J]. 2015 INTERNATIONAL CONFERENCE ON SYSTEMS, SIGNALS AND IMAGE PROCESSING (IWSSIP 2015), 2015, : 73 - 76
  • [9] Speech Emotion Recognition Using Auditory Spectrogram and Cepstral Features
    Zhao, Shujie
    Yang, Yan
    Cohen, Israel
    Zhang, Lijun
    [J]. 29TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2021), 2021, : 136 - 140
  • [10] SPEECH RECOGNITION THROUGH SPECTROGRAM MATCHING
    INGEMANN, F
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 56 : S27 - S27