Analysis of Features and Metrics for Alignment in Text-Dependent Voice Conversion

被引:5
|
作者
Shah, Nirmesh J. [1 ]
Patil, Hemant A. [1 ]
机构
[1] DA IICT, Speech Res Lab, Gandhinagar, India
关键词
Gaussian Mixture Model; Spectral features; Posterior features; RECOGNITION;
D O I
10.1007/978-3-319-69900-4_38
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Voice Conversion (VC) is a technique that convert the perceived speaker identity from a source speaker to a target speaker. Given a source and target speakers' parallel training speech database in the text-dependent VC, first task is to align source and target speakers' spectral features at frame-level before learning the mapping function. The accuracy of alignment will affect the learning of mapping function and hence, the voice quality of converted voice in VC. The impact of alignment is not much explored in the VC literature. Most of the alignment techniques try to align the acoustical features (namely, spectral features, such as Mel Cepstral Coefficients (MCC)). However, spectral features represents both speaker as well as speech-specific information. In this paper, we have done analysis on the use of different speaker-independent features (namely, unsupervised posterior features, such as, Gaussian Mixture Model (GMM)-based and Maximum A Posteriori (MAP) adapted from Universal Background Model (UBM), i.e., GMM-UBM-based posterior features) for the alignment task. In addition, we propose to use different metrics, such as, symmetric Kullback-Leibler (KL) and cosine distances instead of Euclidean distance for the alignment. Our analysis-based on % Phone Accuracy (PA) is correlating with subjective scores of the developed VC systems with 0.98 Pearson correlation coefficient.
引用
收藏
页码:299 / 307
页数:9
相关论文
共 50 条
  • [41] EXPLOITING SEQUENCE INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION
    Dey, Subhadeep
    Motlicek, Petr
    Madikeri, Srikanth
    Ferras, Marc
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5370 - 5374
  • [42] Text-Dependent Speaker Recognition With Random Digit Strings
    Stafylakis, Themos
    Alam, Md Jahangir
    Kenny, Patrick
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (07) : 1194 - 1203
  • [43] Text-Dependent Speaker Recognition System for Indian Languages
    Rao, R. Rajeswara
    Nagesh, A.
    Prasad, Kamakshi
    Babu, K. Ephraim
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2007, 7 (11): : 65 - 71
  • [44] Speaker and Channel Factors in Text-Dependent Speaker Recognition
    Stafylakis, Themos
    Kenny, Patrick
    Alam, Md. Jahangir
    Kockmann, Marcel
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (01) : 65 - 78
  • [45] Template-matching for text-dependent speaker verification
    Dey, Subhadeep
    Motlicek, Petr
    Madikeri, Srikanth
    Ferras, Marc
    SPEECH COMMUNICATION, 2017, 88 : 96 - 105
  • [46] Score Fusion in Text-Dependent Speaker Recognition Systems
    Mekyska, Jiri
    Faundez-Zanuy, Marcos
    Smekal, Zdenek
    Fabregas, Joan
    ANALYSIS OF VERBAL AND NONVERBAL COMMUNICATION AND ENACTMENT: THE PROCESSING ISSUES, 2011, 6800 : 120 - +
  • [47] Improvement in Text-Dependent Mispronunciation Detection for English Learners
    Huang, Guimin
    Qin, Changxiu
    Shen, Yan
    Zhou, Ya
    INFORMATION TECHNOLOGY AND INTELLIGENT TRANSPORTATION SYSTEMS, VOL 2, 2017, 455 : 131 - 138
  • [48] Pattern recognition and text-dependent recognition for a mobile robot
    Solano, Natalia F. Gonzalez
    Muniz, Raul E. Torres
    IEEE MWSCAS'06: PROCEEDINGS OF THE 2006 49TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS,, 2006, : 379 - +
  • [49] Modified HME architecture for text-dependent speaker identification
    Peking Univ, Beijing, China
    IEEE Trans Neural Networks, 5 (1309-1313):
  • [50] End-to-End Text-Dependent Speaker Verification
    Heigold, Georg
    Moreno, Ignacio
    Bengio, Samy
    Shazeer, Noam
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5115 - 5119