On the Assessment of High-Quality Voice Recordings including Voice Postprocessing

被引:2
|
作者
Beerends, John G. [1 ]
Beerends, Imre [2 ]
机构
[1] TNO, NL-2509 JE The Hague, Netherlands
[2] Mantis Audio, Wateringen, Netherlands
来源
关键词
ITU-T STANDARD; ASSESSMENT POLQA;
D O I
10.17743/jaes.2015.0013
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
When we assess the quality of a voice recording two different aspects play a role the voice characteristics (voice quality) and the audio chain characteristics (audio quality). Subjective experiments where no clear ideal reference is provided, so called absolute category rating experiments, assess the speech quality, i.e., the combined effect of voice and audio quality. This paper investigates whether voice postprocessing such as timbre optimization, loudness optimization; de-essing, room reverberation optimization, and (background) noise suppression can improve the quality of a high quality voice recording. It turned out that none of the processing provides a significant improvement in perceived quality. The best postprocessing is noise reduction to absolute silence, delivering only a non-significant improvement when the voice recording is of high quality. The subjective quality evaluations show a significant preference of male over female voice and a significant effect of speaker/sentence dependency on the perceived quality of certain types of degradation. The subjective results are compared with predictions made with the ITU-T standard for the objective assessment of speech quality POLQA (ITU-T Recommendation P.863 versions 1.1 and 2.4) and shows that many speech quality effects are predicted correctly, on condition level as well as individual sentence level.
引用
收藏
页码:174 / 183
页数:10
相关论文
共 50 条
  • [21] A dataset of histograms of original and fake voice recordings (H -Voice)
    Ballesteros, Dora M.
    Rodriguez, Yohanna
    Renza, Diego
    [J]. DATA IN BRIEF, 2020, 29
  • [22] Quality assessment of interactive voice applications
    da Silva, Ana Paula Couto
    Varela, Martin
    de Souza e Silva, Edmundo
    Leao, Rosa M. M.
    Rubino, Gerardo
    [J]. COMPUTER NETWORKS, 2008, 52 (06) : 1179 - 1192
  • [23] VOICE INDEXING OF TAPE RECORDINGS
    CHANDLER, JG
    [J]. JOURNAL OF VISUAL IMPAIRMENT & BLINDNESS, 1979, 73 (05) : 191 - 192
  • [24] COMPRESSION METHOD FOR VOICE PREPROCESSING AND POSTPROCESSING.
    Anon
    [J]. IBM technical disclosure bulletin, 1986, 29 (04): : 1756 - 1757
  • [25] Voice quality planning for NGN including mobile networks
    Pravda, Ivan
    Vodrazka, Jiri
    [J]. PERSONAL WIRELESS COMMUNICATIONS, 2007, 245 : 376 - +
  • [26] Speech Codecs for High-Quality Voice over ZigBee Applications: Evaluation and Implementation Challenges
    Touloupis, E.
    Meliones, Apostolos
    Apostolacos, S.
    [J]. IEEE COMMUNICATIONS MAGAZINE, 2012, 50 (04) : 122 - 128
  • [27] HIGH-QUALITY NONPARALLEL VOICE CONVERSION BASED ON CYCLE-CONSISTENT ADVERSARIAL NETWORK
    Fang, Fuming
    Yamagishi, Junichi
    Echizen, Isao
    Lorenzo-Trueba, Jaime
    [J]. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5279 - 5283
  • [28] Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
    Jia, Ye
    Ramanovich, Michelle Tadmor
    Remez, Tal
    Pomerantz, Roi
    [J]. INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 10120 - 10134
  • [29] FAST AND HIGH-QUALITY SINGING VOICE SYNTHESIS SYSTEM BASED ON CONVOLUTIONAL NEURAL NETWORKS
    Nakamura, Kazuhiro
    Takaki, Shinji
    Hashimoto, Kei
    Oura, Keiichiro
    Nankaku, Yoshihiko
    Tokuda, Keiichi
    [J]. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7239 - 7243
  • [30] A Revisit to Feature Handling for High-quality Voice Conversion Based on Gaussian Mixture Model
    Suda, Hitoshi
    Kotani, Gaku
    Takamichi, Shinnosuke
    Saito, Daisuke
    [J]. 2018 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2018, : 816 - 822