TRANSFORMER-BASED STREAMING ASR WITH CUMULATIVE ATTENTION

被引：3

作者：

Li, Mohan ^{[1
]}

Zhang, Shucong ^{[1
]}

Zorila, Catalin ^{[1
]}

Doddipatla, Rama ^{[1
]}

机构：

[1] Toshiba Europe Ltd, Cambridge Res Lab, Cambridge, England

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2022年

关键词：

End-to-end ASR; Transformer; online attention mechanism; cumulative attention;

D O I：

10.1109/ICASSP43922.2022.9746693

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper, we propose an online attention mechanism, known as cumulative attention (CA), for streaming Transformer-based automatic speech recognition (ASR). Inspired by monotonic chunkwise attention (MoChA) and head-synchronous decoder-end adaptive computation steps (HS-DACS) algorithms, CA triggers the ASR outputs based on the acoustic information accumulated at each encoding timestep, where the decisions are made using a trainable device, referred to as halting selector. In CA, all the attention heads of the same decoder layer are synchronised to have a unified halting position. This feature effectively alleviates the problem caused by the distinct behaviour of individual heads, which may otherwise give rise to severe latency issues as encountered by MoChA. The ASR experiments conducted on AIShell-1 and Librispeech datasets demonstrate that the proposed CA-based Transformer system can achieve on par or better performance with significant reduction in latency during inference, when compared to other streaming Transformer systems in literature.

引用

下载

页码：8272 / 8276

页数：5

共 50 条

[1] HEAD-SYNCHRONOUS DECODING FOR TRANSFORMER-BASED STREAMING ASR
Li, Mohan
Zorila, Catalin
Doddipatla, Rama
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5909 - 5913
[2] Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism
Wei, Kun
Guo, Pengcheng
Jiang, Ning
INTERSPEECH 2022, 2022, : 3804 - 3808
[3] Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
Maekaku, Takashi
Fujita, Yuya
Peng, Yifan
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 1071 - 1075
[4] Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
Maekaku, Takashi
Fujita, Yuya
Peng, Yifan
Watanabe, Shinji
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 1071 - 1075
[5] Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory
Wu, Chunyang
Wang, Yongqiang
Shi, Yangyang
Yeh, Ching-Feng
Zhang, Frank
INTERSPEECH 2020, 2020, : 2132 - 2136
[6] Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies
Li, Zehan
Miao, Haoran
Deng, Keqi
Cheng, Gaofeng
Tian, Sanli
Li, Ta
Yan, Yonghong
INTERSPEECH 2022, 2022, : 1671 - 1675
[7] Transformer-based Acoustic Modeling for Streaming Speech Synthesis
Wu, Chunyang
Xiu, Zhiping
Shi, Yangyang
Kalinli, Ozlem
Fuegen, Christian
Koehler, Thilo
He, Qing
INTERSPEECH 2021, 2021, : 146 - 150
[8] Attention Calibration for Transformer-based Sequential Recommendation
Zhou, Peilin
Ye, Qichen
Xie, Yueqi
Gao, Jingqi
Wang, Shoujin
Kim, Jae Boum
You, Chenyu
Kim, Sunghun
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 3595 - 3605
[9] Compute Cost Amortized Transformer for Streaming ASR
Xie, Yi
Macoskey, Jonathan
Radfar, Martin
Chang, Feng-Ju
King, Brian
Rastrow, Ariya
Mouchtaris, Athanasios
Strimel, Grant P.
INTERSPEECH 2022, 2022, : 3043 - 3047
[10] Transformer-based attention network for stock movement prediction
Zhang, Qiuyue
Qin, Chao
Zhang, Yunfeng
Bao, Fangxun
Zhang, Caiming
Liu, Peide
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 202

← 1 2 3 4 5 →