ATTENTION-BASED MULTI-HYPOTHESIS FUSION FOR SPEECH SUMMARIZATION

被引:1
|
作者
Kano, Takatomo [1 ]
Ogawa, Atsunori [1 ]
Delcroix, Marc [1 ]
Watanabe, Shinji [2 ]
机构
[1] NTT Corp, Tokyo, Japan
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
Speech Summarization; Automatic Speech Recognition; BERT; Attention-based Fusion;
D O I
10.1109/ASRU51503.2021.9687977
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech summarization, which generates a text summary from speech, can be achieved by combining automatic speech recognition (ASR) and text summarization (TS). With this cascade approach, we can exploit state-of-the-art models and large training datasets for both subtasks, i.e., Transformer for ASR and Bidirectional Encoder Representations from Transformers (BERT) for TS. However, ASR errors directly affect the quality of the output summary in the cascade approach. We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary. We investigate several schemes to combine ASR hypotheses. First, we propose using the sum of sub-word embedding vectors weighted by their posterior values provided by an ASR system as an input to a BERT-based TS system. Then, we introduce a more general scheme that uses an attention-based fusion module added to a pre-trained BERT module to align and combine several ASR hypotheses. Finally, we perform speech summarization experiments on the How2 dataset and a newly assembled TED-based dataset that we will release with this paper(1). These experiments show that retraining the BERT-based TS system with these schemes can improve summarization performance and that the attention-based fusion module is particularly effective.
引用
收藏
页码:487 / 494
页数:8
相关论文
共 50 条
  • [31] Multi-hypothesis method in pulses deinterleaving
    Brolly, C
    Alengrin, G
    Lopez, JM
    Perez, P
    [J]. RADAR PROCESSING, TECHNOLOGY, AND APPLICATIONS III, 1998, 3462 : 273 - 282
  • [32] Bayesian Multi-Hypothesis Scan Matching
    Brekke, Edmund
    Chitre, Mandar
    [J]. 2013 MTS/IEEE OCEANS - BERGEN, 2013,
  • [33] STREAM ATTENTION-BASED MULTI-ARRAY END-TO-END SPEECH RECOGNITION
    Wang, Xiaofei
    Li, Ruizhi
    Mallidi, Sri Harish
    Hori, Takaaki
    Watanabe, Shinji
    Hermansky, Hynek
    [J]. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7105 - 7109
  • [35] Attention-Based Multi-Learning Approach for Speech Emotion Recognition With Dilated Convolution
    Kakuba, Samuel
    Poulose, Alwin
    Han, Dong Seog
    [J]. IEEE ACCESS, 2022, 10 : 122302 - 122313
  • [36] A multi-hypothesis tracker for clicking whales
    Baggenstoss, Paul M.
    [J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2015, 137 (05): : 2552 - 2562
  • [37] Enhancements of Attention-Based Bidirectional LSTM for Hybrid Automatic Text Summarization
    Jiang, Jiawen
    Zhang, Haiyang
    Dai, Chenxu
    Zhao, Qingjuan
    Feng, Hao
    Ji, Zhanlin
    Ganchev, Ivan
    [J]. IEEE ACCESS, 2021, 9 : 123660 - 123671
  • [38] Multi-hypothesis tracking of the tongue surface in ultrasound video recordings of normal and impaired speech
    Laporte, Catherine
    Menard, Lucie
    [J]. MEDICAL IMAGE ANALYSIS, 2018, 44 : 98 - 114
  • [39] Multi-Hypothesis Machine Translation Evaluation
    Fomincheva, Marina
    Specia, Lucia
    Guzman, Francisco
    [J]. 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 1218 - 1232
  • [40] SAR Imaging of Moving Targets Based on Fusion of Range-Doppler and Probabilistic Multi-Hypothesis Tracking Algorithms
    Taher, Abdullah W.
    Narayanan, Ram M.
    [J]. RADAR SENSOR TECHNOLOGY XXIV, 2020, 11408