Robust Automatic Speech Recognition for Call Center Applications

被引:0
|
作者
Felipe Parra-Gallego, Luis [1 ,2 ]
Arias-Vergara, Tomas [1 ,3 ]
Orozco Arroyave, Juan Rafael [1 ,3 ]
机构
[1] Univ Antioquia UdeA, GITA Lab Fac Engn, Medellin, Colombia
[2] Konecta Grp SAS, Medellin, Colombia
[3] Friedrich Alexander Univ Erlangen Nurnberg, Pattern Recognit Lab, Erlangen, Germany
关键词
ASR; Noise reduction; Speech enhancement; Speech-to-text;
D O I
10.1007/978-3-030-86702-7_7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
This paper is focused on developing an Automatic Speech Recognition (ASR) system robust against different noisy scenarios. ASR systems are widely used in call centers to convert telephone recordings into text transcriptions which are further used as input to automatically evaluate the Quality of the Service (QoS). Since the evaluation of the QoS and the customer satisfaction is performed by analyzing the text resulting from the ASR system, this process highly depends on the accuracy of the transcription. Given that the calls are usually recorded in non-controlled acoustic conditions, the accuracy of the ASR is typically decreased. To address this problem, we first evaluated four different hybrid architectures: (1) Gaussian Mixture Models (GMM) (baseline), (2) Time Delay Neural Network (TDNN), (3) Long Short-Term Memory (LSTM), and (4) Gated Recurrent Unit (GRU). The evaluation is performed considering a total of 478,6 h of recordings collected in a real call-center. Each recording has its respective transcription and three perceptual labels about the level of noise present during the phone-call: Low level of noise (LN), Medium Level of noise (ML), and High Level of noise (HN). The LSTM-based model achieved the best performance in the MN and HN scenarios with 22, 55% and 27, 99% of word error rate (WER), respectively. Additionally, we implemented a denoiser based on GRUs to enhance the speech signals and the results improved in 1,16% in the HN scenario.
引用
收藏
页码:72 / 83
页数:12
相关论文
共 50 条
  • [1] Noise Robust Exemplar Matching for Speech Enhancement: Applications to Automatic Speech Recognition
    Yilmaz, Emre
    Baby, Deepak
    Van Hannne, Hugo
    [J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 688 - 692
  • [2] Robust Tri-Modal Automatic Speech Recognition for Consumer Applications
    Anderson, Steven J.
    Fong, A. C. M.
    Tang, Jie
    [J]. IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2013, 59 (02) : 352 - 360
  • [3] Application of automatic speech recognition in call classification
    Das, SS
    Chan, N
    Wages, D
    Hansen, JHL
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 3896 - 3899
  • [4] Call Redistribution for a Call Center Based on Speech Emotion Recognition
    Bojanic, Milana
    Delic, Vlado
    Karpov, Alexey
    [J]. APPLIED SCIENCES-BASEL, 2020, 10 (13):
  • [5] Robust speech detector for speech recognition applications
    Liang, WQ
    Chen, YN
    Shan, YX
    Liu, J
    Liu, RS
    [J]. 2002 IEEE REGION 10 CONFERENCE ON COMPUTERS, COMMUNICATIONS, CONTROL AND POWER ENGINEERING, VOLS I-III, PROCEEDINGS, 2002, : 453 - 456
  • [6] PRINCIPLES AND APPLICATIONS OF AUTOMATIC SPEECH RECOGNITION
    KLUGMANN, D
    DREISBACH, B
    GNETTNER, W
    [J]. SIEMENS FORSCHUNGS-UND ENTWICKLUNGSBERICHTE-SIEMENS RESEARCH AND DEVELOPMENT REPORTS, 1981, 10 (05): : 316 - 322
  • [7] Automatic speech recognition and its applications
    Levitt, H
    [J]. ISSUES UNRESOLVED: NEW PERSPECTIVES ON LANGUAGE AND DEAF EDUCATION, 1998, : 133 - 138
  • [8] A distributed architecture for robust automatic speech recognition
    Hacioglu, K
    Pellom, B
    [J]. 2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 328 - 331
  • [9] ROBUST AUTOMATIC RECOGNITION OF SPEECH WITH BACKGROUND MUSIC
    Malek, Jiri
    Zdansky, Jindrich
    Cerva, Petr
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5210 - 5214
  • [10] Speech recognition enhances call-center capabilities
    不详
    [J]. COMMUNICATIONS NEWS, 2000, 37 (11): : 44 - 44