Unsupervised spoken term discovery using pseudo lexical induction

被引:0
|
作者
Sudhakar P. [1 ]
Sreenivasa Rao K. [2 ]
Mitra P. [2 ]
机构
[1] Advanced Technology Development Centre, Indian Institute of Technology, West Bengal, Kharagpur
[2] Department of Computer Science and Engineering, Indian Institute of Technology, West Bengal, Kharagpur
关键词
Context-free grammar; Pattern matching; Self-organising map; Speech processing; Spoken term discovery; Zero-resource;
D O I
10.1007/s10772-023-10049-6
中图分类号
学科分类号
摘要
An unsupervised spoken term discovery task aims to capture the pattern similarities among spoken terms in the absence of annotation. Such an approach is useful for the untranscribed spoken content from low-resource or zero-resource languages. A challenge in the discovery task is to compute the similarities among spoken terms without annotation. Dynamic time warping (DTW) is one of the techniques that computes temporal alignment between two acoustic feature representations of the speech signal without annotation. However, the speech variabilities that arise in natural speech introduce a challenge to the DTW approach. As a result, the performance of the spoken term discovery task was degraded. In this study, we overcome the challenges and improve the performance of the discovery task in three stages. At first, the speaker-independent acoustic feature representation was obtained from the Self Organising Map (SOM) to reduce the variabilities. In the second stage, non-segmental pseudo-labels were generated for the spoken content using context-free grammar. Finally, the spoken term similarities were obtained by grouping the similar sequences using proposed Label Sequence Similarity Mapping and Language modelling algorithms. The performance of the proposed system was measured using the Zero-Speech challenge corpus in terms of matching, clustering and parsing qualities. The experimental results reveal that the proposed approach improves the performance by 34.2% and 22.4% in English and Xitsonga, respectively, across multiple speakers. In addition, the clustering performance of the spoken terms at the word level was improved by 4.2% in English. © 2023, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
引用
收藏
页码:801 / 816
页数:15
相关论文
共 50 条
  • [31] UNSUPERVISED SPOKEN TERM DETECTION WITH SPOKEN QUERIES BY MULTI-LEVEL ACOUSTIC PATTERNS WITH VARYING MODEL GRANULARITY
    Chung, Cheng-Tao
    Chan, Chun-an
    Lee, Lin-shan
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [32] UNSUPERVISED QUERY-BY-EXAMPLE SPOKEN TERM DETECTION USING SEGMENT-BASED BAG OF ACOUSTIC WORDS
    George, Basil
    Yegnanarayana, B.
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [33] Unsupervised query by example spoken term detection using features concatenated with Self-Organizing Map distances
    Wu, Haiwei
    Li, Ming
    Cai, Zexin
    Zhong, Haibin
    [J]. 2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 245 - 249
  • [34] The activation of embedded (pseudo-)stems in auditory lexical processing: implications for models of spoken word recognition
    Creemers, Ava
    Chanchaochai, Nattanun
    Tamminga, Meredith
    Embick, David
    [J]. LANGUAGE COGNITION AND NEUROSCIENCE, 2023, 38 (07) : 966 - 982
  • [35] Unsupervised object discovery with pseudo label generated using K-means and self-supervised transformer
    Lim, SeongTaek
    Park, JaeEon
    Lee, MinYoung
    Lee, HongChul
    [J]. NEUROCOMPUTING, 2023, 545
  • [36] Unsupervised Query-by-example spoken term detection based on DPHMM tokenizer
    Cao Jiankai
    Zhang Lianhai
    [J]. 2017 IEEE 2ND ADVANCED INFORMATION TECHNOLOGY, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (IAEAC), 2017, : 1321 - 1325
  • [37] Unsupervised morphology induction using Morfessor
    Creutz, Mathias
    Lagus, Krista
    Virpioja, Sami
    [J]. FINITE-STATE METHODS AND NATURAL LANGUAGE PROCESSING, 2006, 4002 : 300 - +
  • [38] System for fast lexical and phonetic spoken term detection in a Czech cultural heritage archive
    Josef Psutka
    Jan Švec
    Josef V Psutka
    Jan Vaněk
    Aleš Pražák
    Luboš Šmídl
    Pavel Ircing
    [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2011
  • [39] Newly learned spoken words show long-term lexical competition effects
    Tamminen, Jakke
    Gaskell, M. Gareth
    [J]. QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2008, 61 (03): : 361 - 371
  • [40] System for fast lexical and phonetic spoken term detection in a Czech cultural heritage archive
    Psutka, Josef
    Svec, Jan
    Psutka, Josef V.
    Vanek, Jan
    Prazak, Ales
    Smidl, Lubos
    Ircing, Pavel
    [J]. EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2011, : 1 - 11