Recognizing GSM digital speech

被引:9
|
作者
Gallardo-Antolín, A [1 ]
Peláez-Moreno, C [1 ]
Díaz-de-María, F [1 ]
机构
[1] Univ Carlos III Madrid, Signal Theory & Commun Dept, Leganes 28911, Madrid, Spain
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 06期
关键词
coding distortion; Global System for Mobile (GSM) networks; speech coding; speech recognition; tandeming; transmission errors; wireless networks;
D O I
10.1109/TSA.2005.853210
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech cognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source coding distortion and transmission errors. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bitstream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant advantages. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion as a result of the encoding-decoding process. Second, when transmission errors occur, our front-end becomes more effective since it is not affected by errors in bits allocated to the excitation signal. We have considered the half and the full-rate standard codecs and compared the proposed front-end with the conventional approach in two ASR tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated channel conditions. Furthermore, the disparity increases as the network conditions worsen.
引用
收藏
页码:1186 / 1205
页数:20
相关论文
共 50 条
  • [31] Custom DSP design of a GSM speech coder
    Owall, V
    Andreani, P
    Brange, L
    Nilsson, P
    Wass, A
    Torkelson, M
    JOURNAL OF VLSI SIGNAL PROCESSING, 1995, 11 (03): : 213 - 228
  • [32] Gsm to G.729 speech transcoder
    Tsai, SM
    Yang, JF
    ICECS 2001: 8TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-III, CONFERENCE PROCEEDINGS, 2001, : 485 - 488
  • [33] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
    Mengistu, Kinfe Tadesse
    Rudzicz, Frank
    ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
  • [34] Recognizing speech in a novel accent: the motor theory of speech perception reframed
    Moulin-Frier, Clement
    Arbib, Michael A.
    BIOLOGICAL CYBERNETICS, 2013, 107 (04) : 421 - 447
  • [35] Quality-aware GSM speech watermarking
    Christabel, Koh Jun-Li
    Emmanuel, Sabu
    Kankanhalli, Mohan S.
    PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10, 2008, : 2965 - +
  • [36] Gyrophone: Recognizing Speech From Gyroscope Signals
    Michalevsky, Yan
    Boneh, Dan
    Nakibly, Gabi
    PROCEEDINGS OF THE 23RD USENIX SECURITY SYMPOSIUM, 2014, : 1053 - 1067
  • [37] Recognizing reverberant speech with RASTA-PLP
    Kingsbury, BED
    Morgan, N
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1259 - 1262
  • [38] RECOGNIZING SPEECH - ON THE MAPPING FROM SOUND TO WORD
    MARCUS, SM
    ATTENTION AND PERFORMANCE, 1984, 10 : 151 - 163
  • [39] Recognizing Emotional States Using Speech Information
    Papakostas, Michalis
    Siantikos, Giorgos
    Giannakopoulos, Theodoros
    Spyrou, Evaggelos
    Sgouropoulos, Dimitris
    GENEDIS 2016: GERIATRICS, 2017, 989 : 155 - 164
  • [40] An optimized iterative clustering framework for recognizing speech
    Ashokkumar Palanivinayagam
    Sureshkumar Nagarajan
    International Journal of Speech Technology, 2020, 23 : 767 - 777