A Comprehensive Analysis on Features and Performance Evaluation Metrics in Audio-Visual Voice Conversion

被引:0
|
作者
Ghosh, Subhayu [1 ]
Dhar, Sandipan [1 ]
Jana, Nanda Dulal [1 ]
机构
[1] Natl Inst Technol Durgapur, Durgapur, India
关键词
Audio-Visual Voice Conversion; Feature Extraction; Evaluation Metrics; Objective Evaluation; Subjective Evaluation; SPEECH; QUALITY;
D O I
10.1007/978-3-031-64070-4_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Audio-Visual Voice Conversion (AVVC) is an emerging research field within the realm of audio-visual speech synthesis, involving the transformation of both vocal characteristics and lip movements from a source speaker to a target speaker while preserving linguistic content. Unlike conventional Voice Conversion (VC), AVVC incorporates visual cues alongside speech features to facilitate cross-domain transformations. This technology is driven by advancements in deep learning (DL) algorithms which have supplanted traditional statistical methods in AVVC model enhancements. Despite these advancements, evaluating the quality of AVVC-generated audio and video samples remains a formidable challenge within the research community. This paper systematically analyzes the essential features employed in AVVC models, encompassing both spectral and prosodic attributes. Furthermore, the paper delves into the myriad performance evaluation metrics utilized for assessing the efficacy of these models, including subjective and objective measures. The critical examination of these metrics sheds light on their applicability in the context of audio-visual voice conversion, highlighting the challenges and considerations specific to this field. The extraction of features and analysis of performance evaluation metrics provides a holistic understanding of the challenges and opportunities in this emerging field, aiming to contribute to the advancement of AVVC technologies.
引用
收藏
页码:303 / 318
页数:16
相关论文
共 50 条
  • [31] A Method of Flame Recognition Based on Audio-Visual Features
    Zhang, Yonggao
    Chen, Shaojie
    [J]. INNOVATION AND SUSTAINABILITY OF MODERN RAILWAY, 2012, : 618 - 621
  • [32] Audio-visual Granular Synthesis Performance Demo
    Batty, Joshua
    [J]. PROCEEDINGS OF THE 9TH AUSTRALASIAN CONFERENCE ON INTERACTIVE ENTERTAINMENT (IE 2013), 2013,
  • [33] Onmidirectional audio-visual talker localization based on dynamic fusion of audio-visual features using validity and reliability criteria
    Denda, Yuki
    Nishiura, Takanobu
    Yamashita, Yoichi
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (03): : 598 - 606
  • [34] Audio-visual speech recognition using MPEGA compliant visual features
    Aleksic, PS
    Williams, JJ
    Wu, ZL
    Katsaggelos, AK
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2002, 2002 (11) : 1213 - 1227
  • [35] Audio-Visual Speech Separation with Visual Features Enhanced by Adversarial Training
    Zhang, Peng
    Xu, Jiaming
    Shi, Jing
    Hao, Yunzhe
    Qin, Lei
    Xu, Bo
    [J]. 2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [36] Evaluation of Features in Detection of Dislike Responses to Audio-Visual Stimuli from EEG Signals
    Feradov, Firgan
    Mporas, Iosif
    Ganchev, Todor
    [J]. COMPUTERS, 2020, 9 (02)
  • [37] An Improvement in Audio-Visual Voice Activity Detection for Automatic Speech Recognition
    Yoshida, Takami
    Nakadai, Kazuhiro
    Okuno, Hiroshi G.
    [J]. TRENDS IN APPLIED INTELLIGENT SYSTEMS, PT I, PROCEEDINGS, 2010, 6096 : 51 - +
  • [38] Voice over: Audio-visual congruency and content recall in the gallery setting
    Fairhurst, Merle T.
    Scott, Minnie
    Deroy, Ophelia
    [J]. PLOS ONE, 2017, 12 (06):
  • [39] A deep architecture for audio-visual voice activity detection in the presence of transients
    Ariav, Ido
    Dov, David
    Cohen, Israel
    [J]. SIGNAL PROCESSING, 2018, 142 : 69 - 74
  • [40] SpeechXRays: A User Recognition Platform based on voice Acoustics Analysis and Audio-visual Identity verification
    Spanakis, Emmanouil G.
    [J]. ERCIM NEWS, 2018, (115): : 49 - 50