Recognizing GSM digital speech

被引：9

作者：

Gallardo-Antolín, A ^{[1
]}

Peláez-Moreno, C ^{[1
]}

Díaz-de-María, F ^{[1
]}

机构：

[1] Univ Carlos III Madrid, Signal Theory & Commun Dept, Leganes 28911, Madrid, Spain

来源：

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING | 2005年 / 13卷 / 06期

关键词：

coding distortion; Global System for Mobile (GSM) networks; speech coding; speech recognition; tandeming; transmission errors; wireless networks;

D O I：

10.1109/TSA.2005.853210

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech cognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source coding distortion and transmission errors. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bitstream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant advantages. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion as a result of the encoding-decoding process. Second, when transmission errors occur, our front-end becomes more effective since it is not affected by errors in bits allocated to the excitation signal. We have considered the half and the full-rate standard codecs and compared the proposed front-end with the conventional approach in two ASR tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated channel conditions. Furthermore, the disparity increases as the network conditions worsen.

引用

页码：1186 / 1205

页数：20

共 50 条

[31] Custom DSP design of a GSM speech coder
Owall, V
Andreani, P
Brange, L
Nilsson, P
Wass, A
Torkelson, M
JOURNAL OF VLSI SIGNAL PROCESSING, 1995, 11 (03): : 213 - 228
[32] Gsm to G.729 speech transcoder
Tsai, SM
Yang, JF
ICECS 2001: 8TH IEEE INTERNATIONAL CONFERENCE ON ELECTRONICS, CIRCUITS AND SYSTEMS, VOLS I-III, CONFERENCE PROCEEDINGS, 2001, : 485 - 488
[33] Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech
Mengistu, Kinfe Tadesse
Rudzicz, Frank
ADVANCES IN ARTIFICIAL INTELLIGENCE, 2011, 6657 : 291 - 300
[34] Recognizing speech in a novel accent: the motor theory of speech perception reframed
Moulin-Frier, Clement
Arbib, Michael A.
BIOLOGICAL CYBERNETICS, 2013, 107 (04) : 421 - 447
[35] Quality-aware GSM speech watermarking
Christabel, Koh Jun-Li
Emmanuel, Sabu
Kankanhalli, Mohan S.
PROCEEDINGS OF 2008 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-10, 2008, : 2965 - +
[36] Gyrophone: Recognizing Speech From Gyroscope Signals
Michalevsky, Yan
Boneh, Dan
Nakibly, Gabi
PROCEEDINGS OF THE 23RD USENIX SECURITY SYMPOSIUM, 2014, : 1053 - 1067
[37] Recognizing reverberant speech with RASTA-PLP
Kingsbury, BED
Morgan, N
1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1259 - 1262
[38] RECOGNIZING SPEECH - ON THE MAPPING FROM SOUND TO WORD
MARCUS, SM
ATTENTION AND PERFORMANCE, 1984, 10 : 151 - 163
[39] Recognizing Emotional States Using Speech Information
Papakostas, Michalis
Siantikos, Giorgos
Giannakopoulos, Theodoros
Spyrou, Evaggelos
Sgouropoulos, Dimitris
GENEDIS 2016: GERIATRICS, 2017, 989 : 155 - 164
[40] An optimized iterative clustering framework for recognizing speech
Ashokkumar Palanivinayagam
Sureshkumar Nagarajan
International Journal of Speech Technology, 2020, 23 : 767 - 777

← 1 2 3 4 5 →