Performance Analysis and Optimization of Automatic Speech Recognition

被引:3
|
作者
Tabani, Hamid [1 ]
Arnau, Jose-Maria [1 ]
Tubella, Jordi [1 ]
Gonzalez, Antonio [1 ]
机构
[1] Univ Politecn Cataluna, Comp Architecture Dept, ES-08034 Barcelona, Spain
关键词
Automatic speech recognition; Gaussian mixture models; vectorization;
D O I
10.1109/TMSCS.2017.2739158
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Fast and accurate Automatic Speech Recognition (ASR) is emerging as a key application for mobile devices. Delivering ASR on such devices is challenging due to the compute-intensive nature of the problem and the power constraints of embedded systems. In this paper, we provide a performance and energy characterization of Pocketsphinx, a popular toolset for ASR that targets mobile devices. We identify the computation of the Gaussian Mixture Model (GMM) as the main bottleneck, consuming more than 80 percent of the execution time. The CPI stack analysis shows that branches and main memory accesses are the main performance limiting factors for GMM computation. We propose several software-level optimizations driven by the power/performance analysis. Unlike previous proposals that trade accuracy for performance by reducing the number of Gaussians evaluated, we maintain accuracy and improve performance by effectively using the underlying CPU microarchitecture. First, we use a refactored implementation of the innermost loop of the GMM evaluation code to ameliorate the impact of branches. Second, we exploit the vector unit available on most modern CPUs to boost GMM computation, introducing a novel memory layout for storing the means and variances of the Gaussians in order to maximize the effectiveness of vectorization. Third, we compute the Gaussians for multiple frames in parallel, so means and variances can be fetched once in the on-chip caches and reused across multiple frames, significantly reducing memory bandwidth usage. We evaluate our optimizations using both hardware counters on real CPUs and simulations. Our experimental results show that the proposed optimizations provide 2.68x speedup over the baseline Pocketsphinx decoder on a high-end Intel Skylake CPU, while achieving 61 percent energy savings. On a modern ARM Cortex-A57 mobile processor our techniques improve performance by 1.85x, while providing 59 percent energy savings without any loss in the accuracy of the ASR system.
引用
收藏
页码:847 / 860
页数:14
相关论文
共 50 条
  • [1] DWT features performance analysis for automatic speech recognition of Urdu
    Ali, Hazrat
    Ahmad, Nasir
    Zhou, Xianwei
    Iqbal, Khalid
    Ali, Sahibzada Muhammad
    [J]. SPRINGERPLUS, 2014, 3 : 1 - 10
  • [2] Automatic Speech Recognition Performance for Training on Noised Speech
    Prodeus, Arkadiy
    Kukharicheva, Kateryna
    [J]. 2017 2ND IEEE INTERNATIONAL CONFERENCE ON ADVANCED INFORMATION AND COMMUNICATION TECHNOLOGIES-2017 (AICT 2017), 2017, : 71 - 74
  • [3] Auditory Model Based Optimization of MFCCs Improves Automatic Speech Recognition Performance
    Chatterjee, Saikat
    Koniaris, Christos
    Kleijn, W. Bastiaan
    [J]. INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2943 - 2946
  • [4] BLACK BOX OPTIMIZATION FOR AUTOMATIC SPEECH RECOGNITION
    Watanabe, Shinji
    Le Roux, Jonathan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [5] Performance Analysis of Various Single Channel Speech Enhancement Algorithms for Automatic Speech Recognition
    Song, Myung-Suk
    Lee, Chang-Heon
    Kang, Hong-Goo
    [J]. INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1451 - 1454
  • [6] Acoustic Analysis for Automatic Speech Recognition
    O'Shaughnessy, Douglas
    [J]. PROCEEDINGS OF THE IEEE, 2013, 101 (05) : 1038 - 1053
  • [7] Federated Acoustic Model Optimization for Automatic Speech Recognition
    Tan, Conghui
    Jiang, Di
    Mo, Huaxiao
    Peng, Jinhua
    Tong, Yongxin
    Zhao, Weiwei
    Chen, Chaotao
    Lian, Rongzhong
    Song, Yuanfeng
    Xu, Qian
    [J]. DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT III, 2020, 12114 : 771 - 774
  • [8] Performance Analysis of Hybrid Model of Robust Automatic Continuous Speech Recognition System
    Babu, C. Ganesh
    Sampath, P.
    Hariharan, S.
    Balakumar, S.
    Noufal, Mohamed
    [J]. PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INVENTIVE COMPUTING AND INFORMATICS (ICICI 2017), 2017, : 303 - 306
  • [9] Performance Analysis of Hybrid Automatic Continuous Speech Recognition Framework for Kannada Dialect
    Kumar, Praveen P. S.
    Jayanna, H. S.
    [J]. 2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [10] Automatic Speech Recognition using Correlation Analysis
    Pramanik, Arnab
    Raha, Rajorshee
    [J]. PROCEEDINGS OF THE 2012 WORLD CONGRESS ON INFORMATION AND COMMUNICATION TECHNOLOGIES, 2012, : 670 - 674