Multimodal Physiological Quality-of-Experience Assessment of Text-to-Speech Systems

被引:11
|
作者
Gupta, Rishabh [1 ]
Banville, Hubert J. [1 ]
Falk, Tiago H. [1 ]
机构
[1] Univ Quebec, INRS EMT, Montreal, PQ H2L 2C4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Electroencephalography; functional near-infrared spectroscopy; human factors; multimodal fusion; QoE; EVENT-RELATED DESYNCHRONIZATION; HEART-RATE-VARIABILITY; ALPHA OSCILLATIONS; EEG; SYNCHRONIZATION; SPECTROSCOPY; SIGNAL; FMRI; RECOGNITION; PREDICTION;
D O I
10.1109/JSTSP.2016.2638538
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
With the growing complexity of various text-tospeech systems, it is becoming more important to understand the underlying perceptual and judgement processes that drive user Quality-of-Experience (QoE) perception. Typical QoE assessment techniques, such as listening tests with self-report ratings, are useful but provide limited insight into these underlying processes. Recent advances in neuroimaging and physiological monitoring technologies, however, have opened new doors and allowed us to better understand and measure QoE perception. In this paper, we explore the use of two neuroimaging techniques, namely electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), to better understand neuronal and cerebral haemodynamic changes resultant from synthesized speech of varying quality. Neural correlates of several QoE dimensions were derived and validated on the publicly available PhySyQX database. Fusion of EEG, fNIRS, and fNIRS-derived physiological parameters, combined with conventional features extracted from the synthesized speech signal showed to accurately represent several QoE dimensions, including those related to listener affective states. It is hoped that these findings will help researchers build better instrumental QoE models that incorporate technological, contextual, and human influence factors.
引用
收藏
页码:22 / 36
页数:15
相关论文
共 50 条
  • [1] Physiological Quality-of-Experience Assessment of Text-to-Speech Systems
    Gupta, Rishabh
    Falk, Tiago H.
    [J]. 2016 IEEE 18TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2016,
  • [2] Perceptual Quality Dimensions of Text-to-Speech Systems
    Hinterleitner, Florian
    Moeller, Sebastian
    Norrenbrock, Christoph
    Heute, Ulrich
    [J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2188 - 2191
  • [3] Enhancing the Quality of Nepali Text-to-Speech Systems
    Ghimire, Rupak Raj
    Bal, Bal Krishna
    [J]. CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 187 - 197
  • [4] PHYSYQX: A DATABASE FOR PHYSIOLOGICAL EVALUATION OF SYNTHESISED SPEECH QUALITY-OF-EXPERIENCE
    Gupta, Rishabh
    Banville, Hubert J.
    Falk, Tiago H.
    [J]. 2015 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2015,
  • [5] Instrumental Assessment of Prosodic Quality for Text-to-Speech Signals
    Norrenbrock, Christoph R.
    Hinterleitner, Florian
    Heute, Ulrich
    Moeller, Sebastian
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (05) : 255 - 258
  • [6] Latent factor analysis for synthesized speech quality-of-experience assessment
    Rishabh Gupta
    Tiago H. Falk
    [J]. Quality and User Experience, 2017, 2 (1)
  • [7] Comparison of measures of speech quality for listening tests of text-to-speech systems
    Viswanathan, M
    Viswanathan, M
    [J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 11 - 14
  • [8] Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
    Vich, Robert
    Nouza, Jan
    Vondra, Martin
    [J]. VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 136 - +
  • [9] Comparison of Approaches for Instrumentally Predicting the Quality of Text-To-Speech Systems
    Moeller, Sebastian
    Hinterleitner, Florian
    Falk, Tiago H.
    Polzehl, Tim
    [J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1325 - +
  • [10] A text analyzer for Korean text-to-speech systems
    Lee, SH
    Oh, YH
    [J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1692 - 1695