Multimodal Physiological Quality-of-Experience Assessment of Text-to-Speech Systems

被引：11

作者：

Gupta, Rishabh ^{[1
]}

Banville, Hubert J. ^{[1
]}

Falk, Tiago H. ^{[1
]}

机构：

[1] Univ Quebec, INRS EMT, Montreal, PQ H2L 2C4, Canada

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2017年 / 11卷 / 01期

基金：

加拿大自然科学与工程研究理事会;

关键词：

Electroencephalography; functional near-infrared spectroscopy; human factors; multimodal fusion; QoE; EVENT-RELATED DESYNCHRONIZATION; HEART-RATE-VARIABILITY; ALPHA OSCILLATIONS; EEG; SYNCHRONIZATION; SPECTROSCOPY; SIGNAL; FMRI; RECOGNITION; PREDICTION;

D O I：

10.1109/JSTSP.2016.2638538

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

With the growing complexity of various text-tospeech systems, it is becoming more important to understand the underlying perceptual and judgement processes that drive user Quality-of-Experience (QoE) perception. Typical QoE assessment techniques, such as listening tests with self-report ratings, are useful but provide limited insight into these underlying processes. Recent advances in neuroimaging and physiological monitoring technologies, however, have opened new doors and allowed us to better understand and measure QoE perception. In this paper, we explore the use of two neuroimaging techniques, namely electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), to better understand neuronal and cerebral haemodynamic changes resultant from synthesized speech of varying quality. Neural correlates of several QoE dimensions were derived and validated on the publicly available PhySyQX database. Fusion of EEG, fNIRS, and fNIRS-derived physiological parameters, combined with conventional features extracted from the synthesized speech signal showed to accurately represent several QoE dimensions, including those related to listener affective states. It is hoped that these findings will help researchers build better instrumental QoE models that incorporate technological, contextual, and human influence factors.

引用

页码：22 / 36

页数：15

共 50 条

[1] Physiological Quality-of-Experience Assessment of Text-to-Speech Systems
Gupta, Rishabh
Falk, Tiago H.
[J]. 2016 IEEE 18TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2016,
[2] Perceptual Quality Dimensions of Text-to-Speech Systems
Hinterleitner, Florian
Moeller, Sebastian
Norrenbrock, Christoph
Heute, Ulrich
[J]. 12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2188 - 2191
[3] Enhancing the Quality of Nepali Text-to-Speech Systems
Ghimire, Rupak Raj
Bal, Bal Krishna
[J]. CREATIVITY IN INTELLIGENT TECHNOLOGIES AND DATA SCIENCE, (CIT&DS), 2017, 754 : 187 - 197
[4] PHYSYQX: A DATABASE FOR PHYSIOLOGICAL EVALUATION OF SYNTHESISED SPEECH QUALITY-OF-EXPERIENCE
Gupta, Rishabh
Banville, Hubert J.
Falk, Tiago H.
[J]. 2015 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2015,
[5] Instrumental Assessment of Prosodic Quality for Text-to-Speech Signals
Norrenbrock, Christoph R.
Hinterleitner, Florian
Heute, Ulrich
Moeller, Sebastian
[J]. IEEE SIGNAL PROCESSING LETTERS, 2012, 19 (05) : 255 - 258
[6] Latent factor analysis for synthesized speech quality-of-experience assessment
Rishabh Gupta
Tiago H. Falk
[J]. Quality and User Experience, 2017, 2 (1)
[7] Comparison of measures of speech quality for listening tests of text-to-speech systems
Viswanathan, M
Viswanathan, M
[J]. PROCEEDINGS OF THE 2002 IEEE WORKSHOP ON SPEECH SYNTHESIS, 2002, : 11 - 14
[8] Automatic Speech Recognition Used for Intelligibility Assessment of Text-to-Speech Systems
Vich, Robert
Nouza, Jan
Vondra, Martin
[J]. VERBAL AND NONVERBAL FEATURES OF HUMAN-HUMAN AND HUMAN-MACHINE INTERACTIONS, 2008, 5042 : 136 - +
[9] Comparison of Approaches for Instrumentally Predicting the Quality of Text-To-Speech Systems
Moeller, Sebastian
Hinterleitner, Florian
Falk, Tiago H.
Polzehl, Tim
[J]. 11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 1325 - +
[10] A text analyzer for Korean text-to-speech systems
Lee, SH
Oh, YH
[J]. ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 1692 - 1695

← 1 2 3 4 5 →