Unsupervised spoken term discovery using pseudo lexical induction

被引：0

作者：

Sudhakar P. ^{[1
]}

Sreenivasa Rao K. ^{[2
]}

Mitra P. ^{[2
]}

机构：

[1] Advanced Technology Development Centre, Indian Institute of Technology, West Bengal, Kharagpur

[2] Department of Computer Science and Engineering, Indian Institute of Technology, West Bengal, Kharagpur

来源：

International Journal of Speech Technology | 2023年 / 26卷 / 03期

关键词：

Context-free grammar; Pattern matching; Self-organising map; Speech processing; Spoken term discovery; Zero-resource;

D O I：

10.1007/s10772-023-10049-6

中图分类号：

学科分类号：

摘要：

An unsupervised spoken term discovery task aims to capture the pattern similarities among spoken terms in the absence of annotation. Such an approach is useful for the untranscribed spoken content from low-resource or zero-resource languages. A challenge in the discovery task is to compute the similarities among spoken terms without annotation. Dynamic time warping (DTW) is one of the techniques that computes temporal alignment between two acoustic feature representations of the speech signal without annotation. However, the speech variabilities that arise in natural speech introduce a challenge to the DTW approach. As a result, the performance of the spoken term discovery task was degraded. In this study, we overcome the challenges and improve the performance of the discovery task in three stages. At first, the speaker-independent acoustic feature representation was obtained from the Self Organising Map (SOM) to reduce the variabilities. In the second stage, non-segmental pseudo-labels were generated for the spoken content using context-free grammar. Finally, the spoken term similarities were obtained by grouping the similar sequences using proposed Label Sequence Similarity Mapping and Language modelling algorithms. The performance of the proposed system was measured using the Zero-Speech challenge corpus in terms of matching, clustering and parsing qualities. The experimental results reveal that the proposed approach improves the performance by 34.2% and 22.4% in English and Xitsonga, respectively, across multiple speakers. In addition, the clustering performance of the spoken terms at the word level was improved by 4.2% in English. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.

引用

页码：801 / 816

页数：15

共 50 条

[21] Simulating Zero-Resource Spoken Term Discovery
White, Jerome
Oard, Douglas W.
[J]. CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2371 - 2374
[22] WEAKLY SUPERVISED SPOKEN TERM DISCOVERY USING CROSS-LINGUAL SIDE INFORMATION
Bansal, Sameer
Kamper, Herman
Goldwater, Sharon
Lopez, Adam
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5760 - 5764
[23] Acquisition of Lexical Semantics through Unsupervised Discovery of Associations between Perceptual Symbols
Oezer, Tuna
[J]. 2008 IEEE 7TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, 2008, : 19 - 24
[24] TOWARD UNSUPERVISED MODEL-BASED SPOKEN TERM DETECTION WITH SPOKEN QUERIES WITHOUT ANNOTATED DATA
Chan, Chun-an
Chung, Cheng-Tao
Kuo, Yu-Hsin
Lee, Lin-shan
[J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8550 - 8554
[25] Audio Mining: Unsupervised Spoken Term Detection over an Audio Database
Kumar, Kishore R.
Sarkar, Sandipan
Rengaswamy, Pradeep
Rao, K. Sreenivasa
[J]. 2018 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2018, : 514 - 518
[26] Unsupervised classification of biomedical abstracts using lexical association
Read, Jonathon
Webster, Jonathan
Fang, Alex Chengyu
[J]. PACLIC 24 - Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation, 2010, : 261 - 270
[27] Unsupervised Classification of Biomedical Abstracts using Lexical Association
Read, Jonathon
Webster, Jonathan
Fang, Alex Chengyu
[J]. PROCEEDINGS OF THE 24TH PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2010, : 261 - 270
[28] Unsupervised Learning of Continuous Density HMM for Variable-Length Spoken Unit Discovery
Sun, Meng
Van Hamme, Hugo
Wang, Yimin
Zhang, Xiongwei
[J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (01): : 296 - 299
[29] An Evaluation of Graph Clustering Methods for Unsupervised Term Discovery
Lyzinski, Vince
Sell, Gregory
Jansen, Aren
[J]. 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3209 - 3213
[30] Adaptation of Unsupervised Term Discovery for Speech to Sign Languages
Polat, Korhan
Saraclar, Murat
[J]. 2020 28TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2020,

← 1 2 3 4 5 →