Unsupervised spoken term discovery using pseudo lexical induction

被引:0
|
作者
Sudhakar P. [1 ]
Sreenivasa Rao K. [2 ]
Mitra P. [2 ]
机构
[1] Advanced Technology Development Centre, Indian Institute of Technology, West Bengal, Kharagpur
[2] Department of Computer Science and Engineering, Indian Institute of Technology, West Bengal, Kharagpur
关键词
Context-free grammar; Pattern matching; Self-organising map; Speech processing; Spoken term discovery; Zero-resource;
D O I
10.1007/s10772-023-10049-6
中图分类号
学科分类号
摘要
An unsupervised spoken term discovery task aims to capture the pattern similarities among spoken terms in the absence of annotation. Such an approach is useful for the untranscribed spoken content from low-resource or zero-resource languages. A challenge in the discovery task is to compute the similarities among spoken terms without annotation. Dynamic time warping (DTW) is one of the techniques that computes temporal alignment between two acoustic feature representations of the speech signal without annotation. However, the speech variabilities that arise in natural speech introduce a challenge to the DTW approach. As a result, the performance of the spoken term discovery task was degraded. In this study, we overcome the challenges and improve the performance of the discovery task in three stages. At first, the speaker-independent acoustic feature representation was obtained from the Self Organising Map (SOM) to reduce the variabilities. In the second stage, non-segmental pseudo-labels were generated for the spoken content using context-free grammar. Finally, the spoken term similarities were obtained by grouping the similar sequences using proposed Label Sequence Similarity Mapping and Language modelling algorithms. The performance of the proposed system was measured using the Zero-Speech challenge corpus in terms of matching, clustering and parsing qualities. The experimental results reveal that the proposed approach improves the performance by 34.2% and 22.4% in English and Xitsonga, respectively, across multiple speakers. In addition, the clustering performance of the spoken terms at the word level was improved by 4.2% in English. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:801 / 816
页数:15
相关论文
共 50 条
  • [1] Unsupervised Spoken Term Discovery Using wav2vec 2.0
    Iwamoto, Yu
    Shinozaki, Takahiro
    [J]. 2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1082 - 1086
  • [2] Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery
    Bhati, Saurabhchand
    Villalba, Jesus
    Zelasko, Piotr
    Dehak, Najim
    [J]. INTERSPEECH 2020, 2020, : 4876 - 4880
  • [3] A K-NEAREST NEIGHBOURS APPROACH TO UNSUPERVISED SPOKEN TERM DISCOVERY
    Thual, Alexis
    Dancette, Corentin
    Karadayi, Julien
    Benjumea, Juan
    Dupoux, Emmanuel
    [J]. 2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 491 - 497
  • [4] Unsupervised Discovery of Structured Acoustic Tokens With Applications to Spoken Term Detection
    Chung, Cheng-Tao
    Lee, Lin-Shan
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (02) : 394 - 405
  • [5] Exploring multi-language resources for unsupervised spoken term discovery
    Ludusan, Bogdan
    Caranica, Alexandru
    Cucu, Horia
    Buzo, Andi
    Burileanu, Corneliu
    Dupoux, Emmanuel
    [J]. 2015 INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2015,
  • [6] Unsupervised Discovery of Recurring Spoken Terms Using Diagonal Patterns
    Sudhakar, P.
    Rao, K. Sreenivasa
    Mitra, Pabitra
    [J]. PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2023, 2023, 14301 : 61 - 69
  • [7] Unsupervised discovery of homograph senses using lexical context deconvolution
    Portnoy, David
    Bock, Peter
    [J]. WMSCI 2005: 9th World Multi-Conference on Systemics, Cybernetics and Informatics, Vol 1, 2005, : 198 - 203
  • [8] Unsupervised lexical acquisition of relative spatial concepts using spoken user utterances
    Sagara, Rikunari
    Taguchi, Ryo
    Taniguchi, Akira
    Taniguchi, Tadahiro
    Hattori, Koosuke
    Hoguro, Masahiro
    Umezaki, Taizo
    [J]. ADVANCED ROBOTICS, 2022, 36 (1-2) : 54 - 70
  • [9] TOPIC IDENTIFICATION OF SPOKEN DOCUMENTS USING UNSUPERVISED ACOUSTIC UNIT DISCOVERY
    Kesiraju, Santosh
    Pappagari, Raghavendra
    Ondel, Lucas
    Burget, Lukas
    Dehak, Najim
    Khudanpur, Sanjeev
    Cernocky, Jan Honza
    Gangashetty, Suryakanth V.
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5745 - 5749
  • [10] Model-Based Unsupervised Spoken Term Detection with Spoken Queries
    Chan, Chun-an
    Lee, Lin-shan
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (07): : 1330 - 1342