Recognizing GSM digital speech

被引:9
|
作者
Gallardo-Antolín, A [1 ]
Peláez-Moreno, C [1 ]
Díaz-de-María, F [1 ]
机构
[1] Univ Carlos III Madrid, Signal Theory & Commun Dept, Leganes 28911, Madrid, Spain
来源
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 06期
关键词
coding distortion; Global System for Mobile (GSM) networks; speech coding; speech recognition; tandeming; transmission errors; wireless networks;
D O I
10.1109/TSA.2005.853210
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech cognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source coding distortion and transmission errors. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bitstream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant advantages. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion as a result of the encoding-decoding process. Second, when transmission errors occur, our front-end becomes more effective since it is not affected by errors in bits allocated to the excitation signal. We have considered the half and the full-rate standard codecs and compared the proposed front-end with the conventional approach in two ASR tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated channel conditions. Furthermore, the disparity increases as the network conditions worsen.
引用
收藏
页码:1186 / 1205
页数:20
相关论文
共 50 条
  • [21] Recognizing articulatory gestures from speech for robust speech recognition
    Mitra, Vikramjit
    Nam, Hosung
    Espy-Wilson, Carol
    Saltzman, Elliot
    Goldstein, Louis
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2012, 131 (03): : 2270 - 2287
  • [22] Speech compression using CELP speech coding technique in GSM AMR
    Pryadi, Eko
    Gandi, Kuniwati
    Kanalebe, Herman Y.
    2008 IFIP INTERNATIONAL CONFERENCE ON WIRELESS AND OPTICAL COMMUNICATIONS NETWORKS, 2008, : 347 - 350
  • [23] Advanced speech transmission techniques for GSM and beyond
    Delprat, M
    Evci, CC
    1996 IEEE 46TH VEHICULAR TECHNOLOGY CONFERENCE, PROCEEDINGS, VOLS 1-3: MOBILE TECHNOLOGY FOR THE HUMAN RACE, 1996, : 208 - 212
  • [24] The Weather Impact on Speech Quality in GSM Networks
    Rozhon, Jan
    Blaha, Petr
    Voznak, Miroslav
    Skapa, Jan
    COMPUTER NETWORKS, 2012, 291 : 360 - 369
  • [25] Recognizing speech in a novel accent: the motor theory of speech perception reframed
    Clément Moulin-Frier
    Michael A. Arbib
    Biological Cybernetics, 2013, 107 : 421 - 447
  • [26] An improved speech and channel coding for GSM system
    Godyn, D
    Rutkowski, D
    PERSONAL WIRELESS COMMUNICATIONS, 2000, 51 : 79 - 88
  • [27] GSM Speech Coder Indirect Identification Algorithm
    Svecko, Rajko
    Kotnik, Bojan
    Chowdhury, Amor
    Mezgec, Zdenko
    INFORMATICA, 2010, 21 (04) : 575 - 596
  • [28] Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech
    Lee, Yun Kyung
    Park, Jeon Gue
    APPLIED SCIENCES-BASEL, 2021, 11 (06):
  • [29] GSM enhanced full rate speech codec
    Jarvinen, K
    Vainio, J
    Kapanen, P
    Honkanen, T
    Haavisto, P
    Salami, R
    Laflamme, C
    Adoul, JP
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 771 - 774
  • [30] Custom DSP design of a GSM speech coder
    Owall, V.
    Andreani, P.
    Brange, L.
    Nilsson, P.
    Wass, A.
    Torkelson, M.
    Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology, 1995, 11 (03): : 213 - 228