TOWARDS MULTI-SPEAKER UNSUPERVISED SPEECH PATTERN DISCOVERY

被引:40
|
作者
Zhang, Yaodong [1 ]
Glass, James R. [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
unsupervised learning; language acquisition;
D O I
10.1109/ICASSP.2010.5495637
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we explore the use of a Gaussian posteriorgram based representation for unsupervised discovery of speech patterns. Compared with our previous work, the new approach provides significant improvement towards speaker independence. The framework consists of three main procedures: a Gaussian posteriorgram generation procedure which learns an unsupervised Gaussian mixture model and labels each speech frame with a Gaussian posteriorgram representation; a segmental dynamic time warping procedure which locates pairs of similar sequences of Gaussian posteriorgram vectors; and a graph clustering procedure which groups similar sequences into clusters. We demonstrate the viability of using the posteriorgram approach to handle many talkers by finding clusters of words in the TIMIT corpus.
引用
下载
收藏
页码:4366 / 4369
页数:4
相关论文
共 50 条
  • [31] CLeLfPC: a Large Open Multi-Speaker Corpus of French Cued Speech
    Bigi, Brigitte
    Zimmermann, Maryvonne
    Andre, Carine
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 987 - 994
  • [32] Lip2Speech: Lightweight Multi-Speaker Speech Reconstruction with Gabor Features
    Dong, Zhongping
    Xu, Yan
    Abel, Andrew
    Wang, Dong
    APPLIED SCIENCES-BASEL, 2024, 14 (02):
  • [33] MIMO-SPEECH: END-TO-END MULTI-CHANNEL MULTI-SPEAKER SPEECH RECOGNITION
    Chang, Xuankai
    Zhang, Wangyou
    Qian, Yanmin
    Le Roux, Jonathan
    Watanabe, Shinji
    2019 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU 2019), 2019, : 237 - 244
  • [34] Normalization Driven Zero-shot Multi-Speaker Speech Synthesis
    Kumar, Neeraj
    Goel, Srishti
    Narang, Ankur
    Lall, Brejesh
    INTERSPEECH 2021, 2021, : 1354 - 1358
  • [35] A Purely End-to-end System for Multi-speaker Speech Recognition
    Seki, Hiroshi
    Hori, Takaaki
    Watanabe, Shinji
    Le Roux, Jonathan
    Hershey, John R.
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 2620 - 2630
  • [36] ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis
    Xue, Jinlong
    Deng, Yayue
    Han, Yichen
    Li, Ya
    Sun, Jianqing
    Liang, Jiaen
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 230 - 234
  • [37] Multi-Speaker Adaptation for Robust Speech Recognition under Ubiquitous Environment
    Shih, Po-Yi
    Wang, Jhing-Fa
    Lin, Yuan-Ning
    Fu, Zhong-Hua
    ORIENTAL COCOSDA 2009 - INTERNATIONAL CONFERENCE ON SPEECH DATABASE AND ASSESSMENTS, 2009, : 126 - 131
  • [38] Deep Voice 2: Multi-Speaker Neural Text-to-Speech
    Arik, Sercan O.
    Diamos, Gregory
    Gibiansky, Andrew
    Miller, John
    Peng, Kainan
    Ping, Wei
    Raiman, Jonathan
    Zhou, Yanqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [39] End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
    Denisov, Pavel
    Ngoc Thang Vu
    INTERSPEECH 2019, 2019, : 4425 - 4429
  • [40] INVESTIGATING ON INCORPORATING PRETRAINED AND LEARNABLE SPEAKER REPRESENTATIONS FOR MULTI-SPEAKER MULTI-STYLE TEXT-TO-SPEECH
    Chien, Chung-Ming
    Lin, Jheng-Hao
    Huang, Chien-yu
    Hsu, Po-chun
    Lee, Hung-yi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8588 - 8592