Speech Recognition Combining MFCCs and Image Features

被引:4
|
作者
Karlos, Stamatis [1 ]
Fazakis, Nikos [1 ]
Karanikola, Katerina [1 ]
Kotsiantis, Sotiris [1 ]
Sgarbas, Kyriakos [1 ]
机构
[1] Univ Patras, Patras, Greece
来源
SPEECH AND COMPUTER | 2016年 / 9811卷
关键词
ASR; MFCCs; Supervised model; Feature extraction; CBIR features; CLASSIFICATION;
D O I
10.1007/978-3-319-43958-7_79
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic speech recognition (ASR) task constitutes a well-known issue among fields like Natural Language Processing (NLP), Digital Signal Processing (DSP) and Machine Learning (ML). In this work, a robust supervised classification model is presented (MFCCs + autocor + SVM) for feature extraction of solo speech signals. Mel Frequency Cepstral Coefficients (MFCCs) are exploited combined with Content Based Image Retrieval (CBIR) features extracted from spectrogram produced by each frame of the speech signal. Improvement of classification accuracy using such extended feature vectors is examined against using only MFCCs with several classifiers for three scenarios of different number of speakers.
引用
收藏
页码:651 / 658
页数:8
相关论文
共 50 条
  • [1] Scale-invariant MFCCs for speech/speaker recognition
    Tufekci, Zekeriya
    Disken, Gokay
    [J]. TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (05) : 3758 - 3762
  • [2] Channel Robust MFCCs for Continuous Speech Speaker Recognition
    Chougule, Sharada Vikram
    Chavan, Mahesh S.
    [J]. ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 557 - 568
  • [3] Combining Binaural and Cortical Features for Robust Speech Recognition
    Spille, Constantin
    Kollmeier, Birger
    Meyer, Bernd T.
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (04) : 756 - 767
  • [4] Robust Speech Recognition Combining Cepstral and Articulatory Features
    Zha, Zhuan-ling
    Hu, Jin
    Zhan, Qing-ran
    Shan, Ya-hui
    Xie, Xiang
    Wang, Jing
    Cheng, Hao-bo
    [J]. PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC), 2017, : 1401 - 1405
  • [5] Methods for combining the information of various features in speech recognition
    Wang, Chengyou
    Tang, Shuqi
    Liang, Diannong
    Chen, Huihuang
    Tang, Chaojing
    [J]. Shengxue Xuebao/Acta Acustica, 1997, 22 (02): : 111 - 115
  • [6] Speaker recognition via fusion of subglottal features and MFCCs
    Arsikere, Harish
    Gupta, Hitesh Anand
    Alwan, Abeer
    [J]. 15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1106 - 1110
  • [7] Combining acoustic features for improved emotion recognition in Mandarin speech
    Pao, TL
    Chen, YT
    Yeh, JH
    Liao, WY
    [J]. AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PROCEEDINGS, 2005, 3784 : 279 - 285
  • [8] The methods for combining the information of various kinds of features in speech recognition
    WANG Chengyou
    TANG Shuqi
    LIANG Diannong
    CHEN Huihuang and TANG Zhaojing(National University of Defence Technology Changsha 410073)Received
    [J]. Chinese Journal of Acoustics, 1997, (02) : 115 - 120
  • [9] Speech Recognition System Based on OLLO French Corpus by Using MFCCs
    Youcef, Braham Chaouche
    Elemine, Yessaad Mohamed
    Islam, Benmaiza
    Farid, Bouttout
    [J]. RECENT ADVANCES IN ELECTRICAL ENGINEERING AND CONTROL APPLICATIONS, 2017, 411 : 326 - 331
  • [10] Face Image Recognition Combining Holistic and Local Features
    Pan, Chen
    Cao, Feilong
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2009, PT 3, PROCEEDINGS, 2009, 5553 : 407 - +