Voice Privacy Using Time-Scale and Pitch Modification

被引:0
|
作者
Singh D.K. [1 ]
Prajapati G.P. [1 ]
Patil H.A. [1 ]
机构
[1] Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology, Gandhinagar
关键词
Anonymization; Data augmentation; Speech perturbation; Voice privacy;
D O I
10.1007/s42979-023-02549-8
中图分类号
学科分类号
摘要
There is a growing demand toward digitization of various day-to-day work and hence, there is a surge in use of Intelligent Personal Assistants. The extensive use of these smart digital assistants asks for security and privacy preservation techniques because they use personally identifiable characteristics of the user. To that effect, various privacy preservation techniques for different types of voice assistants have been explored. Hence, for voice-based digital assistants, we need a privacy preservation technique. Thus, in this study, we explored the prosody modification methods to modify speaker-specific characteristics of the user, so that the modified utterances can then be made publicly available to use for training of different speech-based systems. This study presents three data augmentation techniques as voice anonymization methods to modify the speaker-dependent speech parameters (i.e., F). The voice anonymization and speech intelligibility are measured objectively using the automatic speaker verification (ASV) and automatic speech recognition (ASR) experiments, respectively, on development and test set of Librispeech dataset. For speed perturbation-based anonymization, up to 53.7% relative increased % EER is observed for a perturbation factor, α= 0.8 for both male and female speakers. For the same case, the % WER was adequate (less than the baseline system), reflecting the use of speed perturbation method as anonymization algorithm in a voice privacy system. The similar performance is observed for pitch perturbation with perturbation factor, λ= - 300 . However, the tempo perturbation could not found to be useful for speaker anonymization during the experiments with % EER in the order of 5–10 % . © 2024, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [31] Time-scale modification of audio signals using enhanced WSOLA with management of transients
    Grofit, Shahaf
    Lavner, Yizhar
    IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (01): : 106 - 115
  • [32] TIME-SCALE MODIFICATION OF SPEED SIGNALS USING CROSS-CORRELATION FUNCTIONS
    SUZUKI, R
    MISAKI, M
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1992, 38 (03) : 357 - 363
  • [33] Objective quality measurement for audio time-scale modification
    Liu, F
    Lee, JJ
    Kuo, CCJ
    INTERNET MULTIMEDIA MANAGEMENT SYSTEMS IV, 2003, 5242 : 208 - 216
  • [34] A time-scale modification dataset with subjective quality labels
    Roberts, Timothy
    Paliwal, Kuldip K.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2020, 148 (01): : 201 - 210
  • [35] A simple hybrid approach to the time-scale modification of speech
    Knox, D
    Bailey, N
    Stewart, I
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2005, 53 (7-8): : 612 - 619
  • [36] A simple hybrid approach to the time-scale modification of speech
    Knox, D. (D.Knox@gcal.ac.uk), 1600, Audio Engineering Society, 60 East 42nd Street, New York, NY 10165-0075, United States (53): : 7 - 8
  • [37] A Time-scale Alternation Method in GMM Voice Conversion System
    Zhou Ying
    Zhang Ling-hua
    ELECTRONIC INFORMATION AND ELECTRICAL ENGINEERING, 2012, 19 : 161 - 164
  • [38] Improved phase vocoder time-scale modification of audio
    Laroche, J
    Dolson, M
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1999, 7 (03): : 323 - 332
  • [39] MATHEMATICAL FRAMEWORK FOR TIME-SCALE MODIFICATION OF SPEECH SIGNALS
    PORTNOFF, MR
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1977, 61 : S68 - S69
  • [40] Quality enhancement of packet audio with time-scale modification
    Liu, F
    Kim, JW
    Kuo, CCJ
    MULTIMEDIA SYSTEMS AND APPLICATIONS V, 2002, 4861 : 163 - 173