Analysis of Features and Metrics for Alignment in Text-Dependent Voice Conversion

被引：5

作者：

Shah, Nirmesh J. ^{[1
]}

Patil, Hemant A. ^{[1
]}

机构：

[1] DA IICT, Speech Res Lab, Gandhinagar, India

来源：

PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017 | 2017年 / 10597卷

关键词：

Gaussian Mixture Model; Spectral features; Posterior features; RECOGNITION;

D O I：

10.1007/978-3-319-69900-4_38

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Voice Conversion (VC) is a technique that convert the perceived speaker identity from a source speaker to a target speaker. Given a source and target speakers' parallel training speech database in the text-dependent VC, first task is to align source and target speakers' spectral features at frame-level before learning the mapping function. The accuracy of alignment will affect the learning of mapping function and hence, the voice quality of converted voice in VC. The impact of alignment is not much explored in the VC literature. Most of the alignment techniques try to align the acoustical features (namely, spectral features, such as Mel Cepstral Coefficients (MCC)). However, spectral features represents both speaker as well as speech-specific information. In this paper, we have done analysis on the use of different speaker-independent features (namely, unsupervised posterior features, such as, Gaussian Mixture Model (GMM)-based and Maximum A Posteriori (MAP) adapted from Universal Background Model (UBM), i.e., GMM-UBM-based posterior features) for the alignment task. In addition, we propose to use different metrics, such as, symmetric Kullback-Leibler (KL) and cosine distances instead of Euclidean distance for the alignment. Our analysis-based on % Phone Accuracy (PA) is correlating with subjective scores of the developed VC systems with 0.98 Pearson correlation coefficient.

引用

页码：299 / 307

页数：9

共 50 条

[1] On the study of replay and voice conversion attacks to text-dependent speaker verification
Wu, Zhizheng
Li, Haizhou
MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (09) : 5311 - 5327
[2] On the study of replay and voice conversion attacks to text-dependent speaker verification
Zhizheng Wu
Haizhou Li
Multimedia Tools and Applications, 2016, 75 : 5311 - 5327
[3] Text-dependent pathological voice detection
Anumanchipalli, Gopala Krishna
Meinedo, Hugo
Bugalho, Miguel
Trancoso, Isabel
Oliveira, Luis C.
Black, Alan W.
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 530 - 533
[4] A Phonetic Alternative to Cross-language Voice Conversion in a Text-dependent Context: Evaluation of Speaker Identity
Yanagisawa, Kayoko
Huckvale, Mark
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2150 - 2153
[5] A Text-dependent Voice Recognition Approach Using the Spectral Distance
Barbu, Tudor
Costin, Mihaela
ISSCS 2009: INTERNATIONAL SYMPOSIUM ON SIGNALS, CIRCUITS AND SYSTEMS, VOLS 1 AND 2, PROCEEDINGS,, 2009, : 313 - 316
[6] Tandem Deep Features for Text-Dependent Speaker Verification
Fu, Tianfan
Qian, Yanmin
Liu, Yuan
Yu, Kai
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1327 - 1331
[7] Automatic detection of voice impairments from text-dependent running speech
Godino-Llorente, J. I.
Fraile, Ruben
Saenz-Lechon, N.
Osma-Ruiz, V.
Gomez-Vilda, P.
BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2009, 4 (03) : 176 - 182
[8] Cepstral Features and Text-Dependent Speaker Identification A Comparative Study
Ouzounov, Atanas
CYBERNETICS AND INFORMATION TECHNOLOGIES, 2010, 10 (01) : 3 - 12
[9] Tandem Features for Text-dependent Speaker Verification on the RedDots Corpus
Alam, Md Jahangir
Kenny, Patrick
Gupta, Vishwa
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 420 - 424
[10] The role of prosody and voice quality in text-dependent categories of storytelling across languages
Montano, Raul
Alias, Francesc
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 1186 - 1190

← 1 2 3 4 5 →