Modular Speech-to-Text Translation for Zero-Shot Cross-Modal Transfer

被引:0
|
作者
Duquenne, Paul-Ambroise [1 ,2 ]
Schwenk, Holger [1 ]
Sagot, Benoit [2 ]
机构
[1] Meta AI, Menlo Pk, CA 94025 USA
[2] Inria, Paris, France
来源
关键词
D O I
10.21437/Interspeech.2023-2484
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recent research has shown that independently trained encoders and decoders, combined through a shared fixed-size representation, can achieve competitive performance in speech-to-text translation. In this work, we show that this type of approach can be further improved with multilingual training. We observe significant improvements in zero-shot cross-modal speech translation, even outperforming a supervised approach based on XLSR for several languages.
引用
收藏
页码:32 / 36
页数:5
相关论文
共 50 条
  • [1] Cross-modal Zero-shot Hashing
    Liu, Xuanwu
    Li, Zhao
    Wang, Jun
    Yu, Guoxian
    Domeniconi, Carlotta
    Zhang, Xiangliang
    [J]. 2019 19TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2019), 2019, : 449 - 458
  • [2] CPT: CROSS-MODAL PREFIX-TUNING FOR SPEECH-TO-TEXT TRANSLATION
    Ma, Yukun
    Trung Hieu Nguyen
    Ma, Bin
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6217 - 6221
  • [3] Bridging the cross-modal gap using adversarial training for speech-to-text translation
    Zhang, Hao
    Yang, Xukui
    Qu, Dan
    Li, Zhen
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 131
  • [4] Generalized Zero-Shot Cross-Modal Retrieval
    Dutta, Titir
    Biswas, Soma
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (12) : 5953 - 5962
  • [5] Bridging the cross-modal gap using adversarial training for speech-to-text translation
    Zhang, Hao
    Yang, Xukui
    Qu, Dan
    Li, Zhen
    [J]. DIGITAL SIGNAL PROCESSING, 2022, 131
  • [6] CROSS-MODAL REPRESENTATION RECONSTRUCTION FOR ZERO-SHOT CLASSIFICATION
    Wang, Yu
    Zhao, Shenjie
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 2820 - 2824
  • [7] A Cross-Modal Alignment for Zero-Shot Image Classification
    Wu, Lu
    Wu, Chenyu
    Guo, Han
    Zhao, Zhihao
    [J]. IEEE ACCESS, 2023, 11 : 9067 - 9073
  • [8] Cross-modal Representation Learning for Zero-shot Action Recognition
    Lin, Chung-Ching
    Lin, Kevin
    Wang, Lijuan
    Liu, Zicheng
    Li, Linjie
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19946 - 19956
  • [9] Manifold regularized cross-modal embedding for zero-shot learning
    Ji, Zhong
    Yu, Yunlong
    Pang, Yanwei
    Guo, Jichang
    Zhang, Zhongfei
    [J]. INFORMATION SCIENCES, 2017, 378 : 48 - 58
  • [10] Cross-modal propagation network for generalized zero-shot learning
    Guo, Ting
    Liang, Jianqing
    Liang, Jiye
    Xie, Guo-Sen
    [J]. PATTERN RECOGNITION LETTERS, 2022, 159 : 125 - 131