Analysis of Features and Metrics for Alignment in Text-Dependent Voice Conversion

被引:5
|
作者
Shah, Nirmesh J. [1 ]
Patil, Hemant A. [1 ]
机构
[1] DA IICT, Speech Res Lab, Gandhinagar, India
关键词
Gaussian Mixture Model; Spectral features; Posterior features; RECOGNITION;
D O I
10.1007/978-3-319-69900-4_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice Conversion (VC) is a technique that convert the perceived speaker identity from a source speaker to a target speaker. Given a source and target speakers' parallel training speech database in the text-dependent VC, first task is to align source and target speakers' spectral features at frame-level before learning the mapping function. The accuracy of alignment will affect the learning of mapping function and hence, the voice quality of converted voice in VC. The impact of alignment is not much explored in the VC literature. Most of the alignment techniques try to align the acoustical features (namely, spectral features, such as Mel Cepstral Coefficients (MCC)). However, spectral features represents both speaker as well as speech-specific information. In this paper, we have done analysis on the use of different speaker-independent features (namely, unsupervised posterior features, such as, Gaussian Mixture Model (GMM)-based and Maximum A Posteriori (MAP) adapted from Universal Background Model (UBM), i.e., GMM-UBM-based posterior features) for the alignment task. In addition, we propose to use different metrics, such as, symmetric Kullback-Leibler (KL) and cosine distances instead of Euclidean distance for the alignment. Our analysis-based on % Phone Accuracy (PA) is correlating with subjective scores of the developed VC systems with 0.98 Pearson correlation coefficient.
引用
收藏
页码:299 / 307
页数:9
相关论文
共 50 条
  • [1] On the study of replay and voice conversion attacks to text-dependent speaker verification
    Wu, Zhizheng
    Li, Haizhou
    MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (09) : 5311 - 5327
  • [2] On the study of replay and voice conversion attacks to text-dependent speaker verification
    Zhizheng Wu
    Haizhou Li
    Multimedia Tools and Applications, 2016, 75 : 5311 - 5327
  • [3] Text-dependent pathological voice detection
    Anumanchipalli, Gopala Krishna
    Meinedo, Hugo
    Bugalho, Miguel
    Trancoso, Isabel
    Oliveira, Luis C.
    Black, Alan W.
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 530 - 533
  • [4] A Phonetic Alternative to Cross-language Voice Conversion in a Text-dependent Context: Evaluation of Speaker Identity
    Yanagisawa, Kayoko
    Huckvale, Mark
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2150 - 2153
  • [5] A Text-dependent Voice Recognition Approach Using the Spectral Distance
    Barbu, Tudor
    Costin, Mihaela
    ISSCS 2009: INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS, VOLS 1 AND 2, PROCEEDINGS,, 2009, : 313 - 316
  • [6] Tandem Deep Features for Text-Dependent Speaker Verification
    Fu, Tianfan
    Qian, Yanmin
    Liu, Yuan
    Yu, Kai
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1327 - 1331
  • [7] Automatic detection of voice impairments from text-dependent running speech
    Godino-Llorente, J. I.
    Fraile, Ruben
    Saenz-Lechon, N.
    Osma-Ruiz, V.
    Gomez-Vilda, P.
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2009, 4 (03) : 176 - 182
  • [8] Cepstral Features and Text-Dependent Speaker Identification A Comparative Study
    Ouzounov, Atanas
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2010, 10 (01) : 3 - 12
  • [9] Tandem Features for Text-dependent Speaker Verification on the RedDots Corpus
    Alam, Md Jahangir
    Kenny, Patrick
    Gupta, Vishwa
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 420 - 424
  • [10] The role of prosody and voice quality in text-dependent categories of storytelling across languages
    Montano, Raul
    Alias, Francesc
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1186 - 1190