Confusion2Vec 2.0: Enriching ambiguous spoken language representations with subwords

被引:1
|
作者
Shivakumar, Prashanth Gurunath [1 ]
Georgiou, Panayiotis [1 ]
Narayanan, Shrikanth [1 ]
Shahamiri, Seyed Reza
机构
[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90007 USA
来源
PLOS ONE | 2022年 / 17卷 / 03期
关键词
CONTEXT; MODELS;
D O I
10.1371/journal.pone.0264488
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Word vector representations enable machines to encode human language for spoken language understanding and processing. Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes ambiguities present in human spoken language in addition to semantics and syntactic information. Confusion2vec provides a robust spoken language representation by considering inherent human language ambiguities. In this paper, we propose a novel word vector space estimation by unsupervised learning on lattices output by an automatic speech recognition (ASR) system. We encode each word in Confusion2vec vector space by its constituent subword character n-grams. We show that the subword encoding helps better represent the acoustic perceptual ambiguities in human spoken language via information modeled on lattice-structured ASR output. The usefulness of the proposed Confusion2vec representation is evaluated using analogy and word similarity tasks designed for assessing semantic, syntactic and acoustic word relations. We also show the benefits of subword modeling for acoustic ambiguity representation on the task of spoken language intent detection. The results significantly outperform existing word vector representations when evaluated on erroneous ASR outputs, providing improvements up-to 13.12% relative to previous state-of-the-art in intent detection on ATIS benchmark dataset. We demonstrate that Confusion2vec subword modeling eliminates the need for retraining/adapting the natural language understanding models on ASR transcripts.
引用
收藏
页数:20
相关论文
共 23 条
  • [1] Spoken Language Intent Detection using Confusion2Vec
    Shivakumar, Prashanth Gurunath
    Yang, Mu
    Georgiou, Panayiotis
    INTERSPEECH 2019, 2019, : 819 - 823
  • [2] Confusion2Vec: towards enriching vector space word representations with representational ambiguities
    Shivakumar, Prashanth Gurunath
    Georgiou, Panayiotis
    PEERJ COMPUTER SCIENCE, 2019, 6
  • [3] Learning Music Representations with wav2vec 2.0
    Ragano, Alessandro
    Benetos, Emmanouil
    Hines, Andrew
    2023 31ST IRISH CONFERENCE ON ARTIFICIAL INTELLIGENCE AND COGNITIVE SCIENCE, AICS, 2023,
  • [4] Unsupervised Spoken Term Discovery Using wav2vec 2.0
    Iwamoto, Yu
    Shinozaki, Takahiro
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1082 - 1086
  • [5] Deep LSTM Spoken Term Detection usingWav2Vec 2.0 Recognizer
    Svec, Jan
    Lehecka, Jan
    Smidl, Lubos
    INTERSPEECH 2022, 2022, : 1886 - 1890
  • [6] End to End Spoken Language Diarization with Wav2vec Embeddings
    Mishra, Jagabandhu
    Patil, Jayadev N.
    Chowdhury, Amartya
    Prasanna, S. R. Mahadeva
    INTERSPEECH 2023, 2023, : 501 - 505
  • [7] PROFICIENCY ASSESSMENT OF L2 SPOKEN ENGLISH USING WAV2VEC 2.0
    Banno, Stefano
    Matassoni, Marco
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1088 - 1095
  • [8] Exploring wav2vec 2.0 on speaker verification and language identification
    Fan, Zhiyun
    Li, Meng
    Zhou, Shiyu
    Xu, Bo
    INTERSPEECH 2021, 2021, : 1509 - 1513
  • [9] wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
    Baevski, Alexei
    Zhou, Henry
    Mohamed, Abdelrahman
    Auli, Michael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [10] Improving wav2vec2-based Spoken Language Identification by Learning Phonological Features
    Shahin, Mostafa
    Nan, Zheng
    Sethu, Vidhyasaharan
    Ahmed, Beena
    INTERSPEECH 2023, 2023, : 4119 - 4123