Age-Based Automatic Voice Conversion Using Blood Relation for Voice Impaired

被引:3
|
作者
Padmini, Palli [1 ]
Paramasivam, C. [1 ]
Lal, G. Jyothish [2 ]
Alharbi, Sadeen [3 ]
Bhowmick, Kaustav [4 ]
机构
[1] Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Dept Elect & Commun Engn, Bengaluru, India
[2] Amrita Vishwa Vidyapeetham, Amrita Sch Engn, Ctr Computat Engn & Networking CEN, Coimbatore, Tamil Nadu, India
[3] King Saud Univ, Coll Comp & Informat Sci, Dept Software Engn, Riyadh, Saudi Arabia
[4] PES Univ, Dept Elect & Commun Engn, Bengaluru, India
来源
CMC-COMPUTERS MATERIALS & CONTINUA | 2022年 / 70卷 / 02期
关键词
Blood relations; KFCG; LBG; MFCC; vector quantization; correlation; speech samples; same-gender; dissimilar gender; voice conversion; PSOLA; SVM; ALGORITHM; SPEECH; PREVALENCE; CHILDREN; DELAY;
D O I
10.32604/cmc.2022.020065
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The present work presents a statistical method to translate human voices across age groups, based on commonalities in voices of blood relations. The age-translated voices have been naturalized extracting the blood relation features e.g., pitch, duration, energy, using Mel Frequency Cepstrum Coefficients (MFCC), for social compatibility of the voice-impaired. The system has been demonstrated using standard English and an Indian language. The voice samples for resynthesis were derived from 12 families, with member ages ranging from 8-80 years. The voice-age translation, performed using the Pitch synchronous overlap and add (PSOLA) approach, by modulation of extracted voice features, was validated by perception test. The translated and resynthesized voices were correlated using Linde, Buzo, Gray (LBG), and Kekre's Fast Codebook generation (KFCG) algorithms. For translated voice targets, a strong (theta < similar to 93% and theta < similar to 96%) correlation was found with blood relatives, whereas, a weak (theta < similar to 78% and theta < similar to 80%) correlation range was found between different families and different gender from same families. The study further subcategorized the sampling and synthesis of the voices into similar or dissimilar gender groups, using a support vector machine (SVM) choosing between available voice samples. Finally, similar to 96%, similar to 93%, and similar to 94% accuracies were obtained in the identification of the gender of the voice sample, the age group samples, and the correlation between the original and converted voice samples, respectively. The results obtained were close to the natural voice sample features and are envisaged to facilitate a near-natural voice for speech-impaired easily.
引用
收藏
页码:4027 / 4051
页数:25
相关论文
共 50 条
  • [31] Voice Conversion Using Gaussian Mixture Models
    D'souza, Kevin
    Talele, K. T. V.
    2015 INTERNATIONAL CONFERENCE ON COMMUNICATION, INFORMATION & COMPUTING TECHNOLOGY (ICCICT), 2015,
  • [32] VOICE CONVERSION USING ARTIFICIAL NEURAL NETWORKS
    Desai, Srinivas
    Raghavendra, E. Veera
    Yegnanarayana, B.
    Black, Alan W.
    Prahallad, Kishore
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 3893 - +
  • [33] Voice conversion using support vector regression
    Song, P.
    Bao, Y. Q.
    Zhao, L.
    Zou, C. R.
    ELECTRONICS LETTERS, 2011, 47 (18) : 1045 - U1586
  • [34] Voice conversion using HMM combined with GMM
    Yue Zhenjun
    Zou Xiang
    Jia Yongxing
    Wang Hao
    CISP 2008: FIRST INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOL 5, PROCEEDINGS, 2008, : 366 - 370
  • [35] Effect of voice imitation using voice conversion by avatar on customer service in Virtual Environments
    Okano, Hiiro
    Wakatsuki, Naoto
    Okada, Yukihiko
    Zempo, Keiichi
    29TH ACM SYMPOSIUM ON VIRTUAL REALITY SOFTWARE AND TECHNOLOGY, VRST 2023, 2023,
  • [36] Statistical Voice Conversion using GA-based Informative Feature
    Sawada, Kohei
    Tagami, Yoji
    Tamura, Satoshi
    Takehara, Masanori
    Hayamizu, Satoru
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [37] Unified model for voice conversion of speech and singing voice using adaptive pitch constraints
    Fukawa, Shogo
    Nose, Takashi
    Imai, Shuhei
    Ito, Akinori
    ACOUSTICAL SCIENCE AND TECHNOLOGY, 2025, 46 (01) : 120 - 123
  • [38] Voice and Irish Based Automatic Moving Camera
    Haque, A. K. M. Fazlul
    Rahman, Mohammad Mahfujur
    Khatun, Amena
    Younus, Muhammad
    Chowdhury, Jahanara Fardous
    2015 4TH INTERNATIONAL CONFERENCE ON RELIABILITY, INFOCOM TECHNOLOGIES AND OPTIMIZATION (ICRITO) (TRENDS AND FUTURE DIRECTIONS), 2015,
  • [39] Voice Conversion Based on Locally Linear Embedding
    Hwang, Hsin-Te
    Wu, Yi-Chiao
    Peng, Yu-Huai
    Hsu, Chin-Cheng
    Tsao, Yu
    Wang, Hsin-Min
    Wang, Yih-Ru
    Chen, Sin-Horng
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2018, 34 (06) : 1493 - 1516
  • [40] Voice Conversion Based on Mixtures of Factor Analyzers
    Uto, Yosuke
    Nankaku, Yoshihiko
    Toda, Tomoki
    Lee, Akinobu
    Tokuda, Keiichi
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 2278 - +