Confusion2Vec 2.0: Enriching ambiguous spoken language representations with subwords

被引：1

作者：

Shivakumar, Prashanth Gurunath ^{[1
]}

Georgiou, Panayiotis ^{[1
]}

Narayanan, Shrikanth ^{[1
]}

Shahamiri, Seyed Reza

机构：

[1] Univ Southern Calif, Dept Elect & Comp Engn, Los Angeles, CA 90007 USA

来源：

PLOS ONE | 2022年 / 17卷 / 03期

关键词：

CONTEXT; MODELS;

D O I：

10.1371/journal.pone.0264488

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Word vector representations enable machines to encode human language for spoken language understanding and processing. Confusion2vec, motivated from human speech production and perception, is a word vector representation which encodes ambiguities present in human spoken language in addition to semantics and syntactic information. Confusion2vec provides a robust spoken language representation by considering inherent human language ambiguities. In this paper, we propose a novel word vector space estimation by unsupervised learning on lattices output by an automatic speech recognition (ASR) system. We encode each word in Confusion2vec vector space by its constituent subword character n-grams. We show that the subword encoding helps better represent the acoustic perceptual ambiguities in human spoken language via information modeled on lattice-structured ASR output. The usefulness of the proposed Confusion2vec representation is evaluated using analogy and word similarity tasks designed for assessing semantic, syntactic and acoustic word relations. We also show the benefits of subword modeling for acoustic ambiguity representation on the task of spoken language intent detection. The results significantly outperform existing word vector representations when evaluated on erroneous ASR outputs, providing improvements up-to 13.12% relative to previous state-of-the-art in intent detection on ATIS benchmark dataset. We demonstrate that Confusion2vec subword modeling eliminates the need for retraining/adapting the natural language understanding models on ASR transcripts.

引用

页数：20

共 23 条

[21] Using Speaker-Specific Emotion Representations in Wav2vec 2.0-Based Modules for Speech Emotion Recognition
Park, Somin
Mark, Mpabulungi
Park, Bogyung
Hong, Hyunki
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 77 (01): : 1009 - 1030
[22] Multi-level Fusion of Fisher Vector Encoded BERT and Wav2vec 2.0 Embeddings for Native Language Identification
Krebbers, Dani
Kaya, Heysem
Karpov, Alexey
SPEECH AND COMPUTER, SPECOM 2022, 2022, 13721 : 391 - 403
[23] Improving Tone Recognition Performance using Wav2vec 2.0-Based Learned Representation in Yoruba, a Low-Resourced Language
Obiang, Saint germes b. bengono
Tsopze, Norbert
Yonta, Paulin melatagia
Bonastre, Jean-francois
Jimenez, Tania
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (12)

← 1 2 3 →