Performance Analysis and Optimization of Automatic Speech Recognition

被引:3
|
作者
Tabani, Hamid [1 ]
Arnau, Jose-Maria [1 ]
Tubella, Jordi [1 ]
Gonzalez, Antonio [1 ]
机构
[1] Univ Politecn Cataluna, Comp Architecture Dept, ES-08034 Barcelona, Spain
关键词
Automatic speech recognition; Gaussian mixture models; vectorization;
D O I
10.1109/TMSCS.2017.2739158
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fast and accurate Automatic Speech Recognition (ASR) is emerging as a key application for mobile devices. Delivering ASR on such devices is challenging due to the compute-intensive nature of the problem and the power constraints of embedded systems. In this paper, we provide a performance and energy characterization of Pocketsphinx, a popular toolset for ASR that targets mobile devices. We identify the computation of the Gaussian Mixture Model (GMM) as the main bottleneck, consuming more than 80 percent of the execution time. The CPI stack analysis shows that branches and main memory accesses are the main performance limiting factors for GMM computation. We propose several software-level optimizations driven by the power/performance analysis. Unlike previous proposals that trade accuracy for performance by reducing the number of Gaussians evaluated, we maintain accuracy and improve performance by effectively using the underlying CPU microarchitecture. First, we use a refactored implementation of the innermost loop of the GMM evaluation code to ameliorate the impact of branches. Second, we exploit the vector unit available on most modern CPUs to boost GMM computation, introducing a novel memory layout for storing the means and variances of the Gaussians in order to maximize the effectiveness of vectorization. Third, we compute the Gaussians for multiple frames in parallel, so means and variances can be fetched once in the on-chip caches and reused across multiple frames, significantly reducing memory bandwidth usage. We evaluate our optimizations using both hardware counters on real CPUs and simulations. Our experimental results show that the proposed optimizations provide 2.68x speedup over the baseline Pocketsphinx decoder on a high-end Intel Skylake CPU, while achieving 61 percent energy savings. On a modern ARM Cortex-A57 mobile processor our techniques improve performance by 1.85x, while providing 59 percent energy savings without any loss in the accuracy of the ASR system.
引用
收藏
页码:847 / 860
页数:14
相关论文
共 50 条
  • [21] AUTOMATIC SPEECH RECOGNITION
    RAO, PVS
    PALIWAL, KK
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 1986, 9 : 85 - 120
  • [22] THE INFLUENCE OF AUTOMATIC SPEECH RECOGNITION ACCURACY ON THE PERFORMANCE OF AN AUTOMATED SPEECH ASSESSMENT SYSTEM
    Tao, Jidong
    Evanini, Keelan
    Wang, Xinhao
    [J]. 2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 294 - 299
  • [23] Harmonicity based dereverberation for improving automatic speech recognition performance and speech intelligibility
    Kinoshita, K
    Nakatani, T
    Miyoshi, M
    [J]. IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2005, E88A (07) : 1724 - 1731
  • [24] Evolution of the performance of automatic speech recognition algorithms in transcribing conversational telephone speech
    Padmanabhan, M
    Saon, G
    Zweig, G
    Huang, J
    Kingsbury, B
    Mangu, L
    [J]. IMTC/2001: PROCEEDINGS OF THE 18TH IEEE INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, VOLS 1-3: REDISCOVERING MEASUREMENT IN THE AGE OF INFORMATICS, 2001, : 1926 - 1931
  • [25] The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance
    Tu, Ming
    Wisler, Alan
    Berisha, Visar
    Liss, Julie M.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2016, 140 (05): : EL416 - EL422
  • [26] Analysis and Optimization of Telephone Speech Command Recognition System Performance in Noisy Environment
    Novotny, Jan
    Sovka, Pavel
    Uhlir, Jan
    [J]. RADIOENGINEERING, 2004, 13 (01) : 1 - 7
  • [27] Speech production and automatic speech recognition
    [J]. Acoustics Bulletin, 2000, 25 (02):
  • [28] AUTOMATIC SPEECH RECOGNITION OF IMPAIRED SPEECH
    CARLSON, GS
    BERNSTEIN, J
    [J]. INTERNATIONAL JOURNAL OF REHABILITATION RESEARCH, 1988, 11 (04) : 396 - 398
  • [29] Robustness of linear discriminant analysis in automatic speech recognition
    Katz, M
    Meier, HG
    Dolfing, H
    Klakow, D
    [J]. 16TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL III, PROCEEDINGS, 2002, : 371 - 374
  • [30] Environmental Noise Analysis for Robust Automatic Speech Recognition
    Kishore, N. Sai Bala
    Venkata, M. Rao
    Nagamani, M.
    [J]. ADVANCED COMPUTER AND COMMUNICATION ENGINEERING TECHNOLOGY, 2015, 315