Metric Learning for Unsupervised Phoneme Segmentation

被引:0
|
作者
Qiao, Yu [1 ]
Minematsu, Nobuaki [1 ]
机构
[1] Univ Tokyo, Grad Sch Engn, Tokyo, Japan
关键词
Unsupervised phoneme segmentation; optimization; Mahalanobis distance; metric learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Unsupervised phoneme segmentation aims at dividing a speech stream into phonemes without using any prior knowledge of linguistic contents and acoustic models. In [1], we formulated this problem into an optimization framework, and developed an objective function, summation of squared error (SSE) based on the Euclidean distance of cepstral features. However, it is unknown whether or not Euclidean distance yields the best metric to estimate the goodness of segmentations. In this paper, we study how to learn a good metric to improve the performance of segmentation. We propose two criteria for learning metric: Minimum of Summation Variance (MSV) and Maximum of Discrimination Variance (MDV). The experimental results on TIMIT database indicate that the use of learning metric can achieve better segmentation performances. The best recall rate of this paper is 81.8% (20ms windows), compared to 77.5% of [1]. We also introduce an iterative algorithm to team metric without using labeled data, which achieves similar results as those with labeled data.
引用
收藏
页码:1060 / 1063
页数:4
相关论文
共 50 条
  • [1] Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation
    Kreuk, Felix
    Keshet, Joseph
    Adi, Yossi
    [J]. INTERSPEECH 2020, 2020, : 3700 - 3704
  • [2] Unsupervised Phoneme Segmentation of Previously Unseen Languages
    Vetter, Marco
    Mueller, Markus
    Hamlaoui, Fatima
    Neubig, Graham
    Nakamura, Satoshi
    Stueker, Sebastian
    Waibel, Alex
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3544 - 3548
  • [3] Unsupervised optimal phoneme segmentation: theory and experimental evaluation
    Qiao, Yu
    Luo, Dean
    Minematsu, Nobuaki
    [J]. IET SIGNAL PROCESSING, 2013, 7 (07) : 577 - 586
  • [4] Unsupervised optimal phoneme segmentation: Objectives, algorithm and comparisons
    Qiao, Yu
    Shimomura, Naoya
    Minematsu, Nobuaki
    [J]. 2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 3989 - 3992
  • [5] Unsupervised Hyperbolic Metric Learning
    Yan, Jiexi
    Luo, Lei
    Deng, Cheng
    Huang, Heng
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 12460 - 12469
  • [6] Unsupervised Metric Learning with Synthetic Examples
    Dutta, Ujjal Kr
    Harandi, Mehrtash
    Sekhar, C. Chandra
    [J]. THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 3834 - 3841
  • [7] Parametric PCA for unsupervised metric learning
    Levada, Alexandre L. M.
    [J]. PATTERN RECOGNITION LETTERS, 2020, 135 : 425 - 430
  • [8] Unsupervised Motion Segmentation Using Metric Embedding of Features
    Osmanlioglu, Yusuf
    Dickinson, Sven
    Shokoufandeh, Ali
    [J]. SIMILARITY-BASED PATTERN RECOGNITION, SIMBAD 2015, 2015, 9370 : 133 - 145
  • [9] ITERATIVE BAYESIAN WORD SEGMENTATION FOR UNSUPERVISED VOCABULARY DISCOVERY FROM PHONEME LATTICES
    Heymann, Jahn
    Walter, Oliver
    Haeb-Umbach, Reinhold
    Raj, Bhiksha
    [J]. 2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [10] Phoneme Segmentation-Based Unsupervised Pattern Discovery and Clustering of Speech Signals
    Kishore Kumar Ravi
    Sreenivasa Rao Krothapalli
    [J]. Circuits, Systems, and Signal Processing, 2022, 41 : 2088 - 2117