TOWARDS MULTI-SPEAKER UNSUPERVISED SPEECH PATTERN DISCOVERY

被引:40
|
作者
Zhang, Yaodong [1 ]
Glass, James R. [1 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
关键词
unsupervised learning; language acquisition;
D O I
10.1109/ICASSP.2010.5495637
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we explore the use of a Gaussian posteriorgram based representation for unsupervised discovery of speech patterns. Compared with our previous work, the new approach provides significant improvement towards speaker independence. The framework consists of three main procedures: a Gaussian posteriorgram generation procedure which learns an unsupervised Gaussian mixture model and labels each speech frame with a Gaussian posteriorgram representation; a segmental dynamic time warping procedure which locates pairs of similar sequences of Gaussian posteriorgram vectors; and a graph clustering procedure which groups similar sequences into clusters. We demonstrate the viability of using the posteriorgram approach to handle many talkers by finding clusters of words in the TIMIT corpus.
引用
收藏
页码:4366 / 4369
页数:4
相关论文
共 50 条
  • [1] Unsupervised Discovery of Phoneme Boundaries in Multi-Speaker Continuous Speech
    Armstrong, Tom
    Antetomaso, Stephanie
    [J]. 2011 IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING (ICDL), 2011,
  • [2] Towards unsupervised pattern discovery in speech
    Park, A
    Glass, JR
    [J]. 2005 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2005, : 53 - 58
  • [3] An Unsupervised Method to Select a Speaker Subset from Large Multi-Speaker Speech Synthesis Datasets
    Gallegos, Pilar Oplustil
    Williams, Jennifer
    Rownicka, Joanna
    King, Simon
    [J]. INTERSPEECH 2020, 2020, : 1758 - 1762
  • [4] Unsupervised Speaker and Expression Factorization for Multi-Speaker Expressive Synthesis of Ebooks
    Chen, Langzhou
    Braunschweiler, Norbert
    [J]. 14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1041 - 1045
  • [5] MultiSpeech: Multi-Speaker Text to Speech with Transformer
    Chen, Mingjian
    Tan, Xu
    Ren, Yi
    Xu, Jin
    Sun, Hao
    Zhao, Sheng
    Qin, Tao
    [J]. INTERSPEECH 2020, 2020, : 4024 - 4028
  • [6] Speaker Clustering with Penalty Distance for Speaker Verification with Multi-Speaker Speech
    Das, Rohan Kumar
    Yang, Jichen
    Li, Haizhou
    [J]. 2019 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2019, : 1630 - 1635
  • [7] Unsupervised pattern discovery in speech
    Park, Alex S.
    Glass, James R.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 186 - 197
  • [8] Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Toda, Tomoki
    [J]. IEEE Signal Processing Letters, 2024, 31 : 2995 - 2999
  • [9] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [10] MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE
    Godambe, Tejas
    Bondale, Nandini
    Samudravijaya, K.
    Rao, Preeti
    [J]. 2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,