Variable Frame Rate Acoustic Models using Minimum Error Reinforcement Learning

被引:1
|
作者
Jiang, Dongcheng [1 ]
Zhang, Chao [1 ]
Woodland, Philip C. [1 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge, England
来源
关键词
Low frame rate; variable frame rate; minimum error rate; multi-task; reinforcement learning;
D O I
10.21437/Interspeech.2021-2198
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Frame selection in automatic speech recognition (ASR) systems can potentially improve the trade-off between speed and accuracy relative to fixed low frame rate methods. In this paper, a sequence training approach based on minimum error and reinforcement learning is proposed for a hybrid ASR system to operate at a variable frame rate, and uses a frame selection controller to predict the number of frames to skip before taking the next inference action. The controller is integrated into the acoustic model in a multi-task training framework as an additional regression task and the controller output can be used for distribution characterisation during reinforcement learning exploration. The reinforcement learning objective minimises a combined measure of the phone error and average frame rate. ASR experiments using British English multi-genre broadcast (MGB3) data show that the proposed approach achieved a smaller frame rate than using a fixed 1/3 low frame rate method and was able to reduce the word error rate relative to both fixed low frame rate and full frame rate systems.
引用
收藏
页码:2601 / 2605
页数:5
相关论文
共 50 条
  • [1] Using Reinforcement Learning and Error Models for Drone Precise Landing
    Saryazdi, Sepehr
    Alkouz, Balsam
    Bouguettaya, Athman
    Lakhdari, Abdallah
    ACM Transactions on Internet Technology, 2024, 24 (03)
  • [2] Speaking Rate Dependent Multiple Acoustic Models Using Continuous Frame Rate Normalization
    Ban, Sung Min
    Kim, Hyung Soon
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [3] An Adaptive Minimum-Frame-Error Rate Detector for Magnetic Recording
    Shi, Shanwei
    Barry, John R.
    IEEE TRANSACTIONS ON MAGNETICS, 2021, 57 (12)
  • [4] Lower Frame Rate Neural Network Acoustic Models
    Pundak, Golan
    Sainath, Tara N.
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 22 - 26
  • [5] An Adaptive Minimum-Frame-Error-Rate BCJR Detector for Magnetic Recording
    Zhang, Yucheng
    Shi, Shanwei
    Barry, John R.
    IEEE TRANSACTIONS ON MAGNETICS, 2023, 59 (11)
  • [6] Investigating data selection for minimum phone error training of acoustic models
    Liu, Shih-Hung
    Chu, Fang-Hui
    Lin, Shih-Hsiang
    Chen, Berlin
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 348 - 351
  • [7] Multiagent learning using a variable learning rate
    Bowling, M
    Veloso, M
    ARTIFICIAL INTELLIGENCE, 2002, 136 (02) : 215 - 250
  • [8] Minimum Classification Error Training of Hidden Markov Models for Acoustic Language Identification
    Bauer, Josef G.
    Timoshenko, Ekaterina
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 405 - 408
  • [9] Learning FRAME Models Using CNN Filters
    Lu, Yang
    Zhu, Song-Chun
    Wu, Ying Nian
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 1902 - 1910
  • [10] VARIABLE FRAME RATE SPEECH CODING USING OPTIMAL INTERPOLATION
    CHUNG, CJ
    CHEN, SH
    IEEE TRANSACTIONS ON COMMUNICATIONS, 1994, 42 (06) : 2215 - 2218