Analysis of Acoustic information in End-to-End Spoken Language Translation

被引:0
|
作者
Sant, Gerard [1 ,2 ,3 ]
Escolano, Carlos [1 ]
机构
[1] Univ Politecn Cataluna, TALP Res Ctr, Barcelona, Spain
[2] Barcelona Supercomp Ctr, Barcelona, Spain
[3] UPC, Barcelona, Spain
来源
关键词
Spoken Language Translation; Interpretability of Acoustic information;
D O I
10.21437/Interspeech.2023-2050
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
End-to-End Transformer-based models are the most popular approach for Spoken Language Translation (SLT). While obtaining state-of-the-art results, we are still far from understanding how these models extract acoustic information from the data and how they are transformed into semantic representations. In this paper, we seek to provide a better understanding of the flow of acoustic information along speech-to-text translation models. By means of the Speaker Classification and Spectrogram Reconstruction tasks, this study (i) interprets the main role of the encoder with respect to the acoustic features, (ii) highlights the importance of the acoustic information throughout the model and its transfer between encoder and decoder, and (iii) reveals the significant effect of downsampling convolutional layers for learning acoustic features. (iv) Finally, we also observe the existence of a strong correlation between the semantic domain and the speakers' labels in MuST-C.
引用
收藏
页码:52 / 56
页数:5
相关论文
共 50 条
  • [1] Adapting Transformer to End-to-end Spoken Language Translation
    Di Gangi, Mattia A.
    Negri, Matteo
    Turchi, Marco
    INTERSPEECH 2019, 2019, : 1133 - 1137
  • [2] ERROR ANALYSIS APPLIED TO END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Caubriere, Antoine
    Ghannay, Sahar
    Tomashenko, Natalia
    De Mori, Renato
    Laurent, Antoine
    Morin, Emmanuel
    Esteve, Yannick
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8514 - 8518
  • [3] TOWARDS END-TO-END SPOKEN LANGUAGE UNDERSTANDING
    Serdyuk, Dmitriy
    Wang, Yongqiang
    Fuegen, Christian
    Kumar, Anuj
    Liu, Baiyang
    Bengio, Yoshua
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5754 - 5758
  • [4] A Streaming End-to-End Framework For Spoken Language Understanding
    Potdar, Nihal
    Avila, Anderson R.
    Xing, Chao
    Wang, Dong
    Cao, Yiran
    Chen, Xiao
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 3906 - 3914
  • [5] Semantic Complexity in End-to-End Spoken Language Understanding
    McKenna, Joseph P.
    Choudhary, Samridhi
    Saxon, Michael
    Strimel, Grant P.
    Mouchtaris, Athanasios
    INTERSPEECH 2020, 2020, : 4273 - 4277
  • [6] WhiSLU: End-to-End Spoken Language Understanding with Whisper
    Wang, Minghan
    Li, Yinglu
    Guo, Jiaxin
    Qiao, Xiaosong
    Li, Zongyao
    Shang, Hengchao
    Wei, Daimeng
    Tao, Shimin
    Zhang, Min
    Yang, Hao
    INTERSPEECH 2023, 2023, : 770 - 774
  • [7] The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation
    He, Mutian
    Garner, Philip N.
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 4408 - 4423
  • [8] End-to-End Translation Validation for the Halide Language
    Clement, Basile
    Cohen, Albert
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2022, 6 (OOPSLA):
  • [9] End-to-End Spoken Language Understanding Without Full Transcripts
    Kuo, Hong-Kwang J.
    Tuske, Zoltan
    Thomas, Samuel
    Huang, Yinghui
    Audhkhasi, Kartik
    Kingsbury, Brian
    Kurata, Gakuto
    Kons, Zvi
    Hoory, Ron
    Lastras, Luis
    INTERSPEECH 2020, 2020, : 906 - 910
  • [10] EFFICIENT USE OF END-TO-END DATA IN SPOKEN LANGUAGE PROCESSING
    Lu, Yiting
    Wang, Yu
    Gales, Mark J. F.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7518 - 7522