ATTENTION-BASED MULTI-HYPOTHESIS FUSION FOR SPEECH SUMMARIZATION

被引:1
|
作者
Kano, Takatomo [1 ]
Ogawa, Atsunori [1 ]
Delcroix, Marc [1 ]
Watanabe, Shinji [2 ]
机构
[1] NTT Corp, Tokyo, Japan
[2] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
关键词
Speech Summarization; Automatic Speech Recognition; BERT; Attention-based Fusion;
D O I
10.1109/ASRU51503.2021.9687977
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech summarization, which generates a text summary from speech, can be achieved by combining automatic speech recognition (ASR) and text summarization (TS). With this cascade approach, we can exploit state-of-the-art models and large training datasets for both subtasks, i.e., Transformer for ASR and Bidirectional Encoder Representations from Transformers (BERT) for TS. However, ASR errors directly affect the quality of the output summary in the cascade approach. We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary. We investigate several schemes to combine ASR hypotheses. First, we propose using the sum of sub-word embedding vectors weighted by their posterior values provided by an ASR system as an input to a BERT-based TS system. Then, we introduce a more general scheme that uses an attention-based fusion module added to a pre-trained BERT module to align and combine several ASR hypotheses. Finally, we perform speech summarization experiments on the How2 dataset and a newly assembled TED-based dataset that we will release with this paper(1). These experiments show that retraining the BERT-based TS system with these schemes can improve summarization performance and that the attention-based fusion module is particularly effective.
引用
收藏
页码:487 / 494
页数:8
相关论文
共 50 条
  • [21] Attention-Based Multi-Modal Fusion Network for Semantic Scene Completion
    Li, Siqi
    Zou, Changqing
    Li, Yipeng
    Zhao, Xibin
    Gao, Yue
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11402 - 11409
  • [22] ARF-Net: a multi-modal aesthetic attention-based fusion
    Iffath, Fariha
    Gavrilova, Marina
    [J]. VISUAL COMPUTER, 2024, 40 (07): : 4941 - 4953
  • [23] Attention-based Multi-level Feature Fusion for Named Entity Recognition
    Yang, Zhiwei
    Chen, Hechang
    Zhang, Jiawei
    Ma, Jing
    Chang, Yi
    [J]. PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3594 - 3600
  • [24] Multimodal Approach of Speech Emotion Recognition Using Multi-Level Multi-Head Fusion Attention-Based Recurrent Neural Network
    Ngoc-Huynh Ho
    Yang, Hyung-Jeong
    Kim, Soo-Hyung
    Lee, Gueesang
    [J]. IEEE ACCESS, 2020, 8 : 61672 - 61686
  • [25] Residual Attention-based Fusion for Video Classification
    Pouyanfar, Samira
    Wang, Tianyi
    Chen, Shu-Ching
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 478 - 480
  • [26] Attention-Based Multimodal Fusion for Video Description
    Hori, Chiori
    Hori, Takaaki
    Lee, Teng-Yok
    Zhang, Ziming
    Harsham, Bret
    Hershey, John R.
    Marks, Tim K.
    Sumi, Kazuhiko
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4203 - 4212
  • [27] Attention-Based Multi-Filter Convolutional Neural Network for Inappropriate Speech Detection
    Lin, Shu-Yu
    Chen, Yi-Ling
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [28] Multi-Hypothesis Prediction Scheme Based on the Joint Sparsity Model
    Chen, Can
    Zhou, Chao
    Liu, Jian
    Zhang, Dengyin
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (11) : 2214 - 2220
  • [29] Multi-stream Attention-based BLSTM with Feature Segmentation for Speech Emotion Recognition
    Chiba, Yuya
    Nose, Takashi
    Ito, Akinori
    [J]. INTERSPEECH 2020, 2020, : 3301 - 3305
  • [30] Source Code Summarization Using Attention-based Keyword Memory Networks
    Choi, YunSeok
    Kim, Suah
    Lee, Jee-Hyong
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP 2020), 2020, : 564 - 570