Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy

被引：0

作者：

Singla, Karan ^{[1
]}

Chen, Zhuohao ^{[1
]}

Flemotomos, Nikolaos ^{[1
]}

Gibson, James ^{[1
]}

Can, Dogan ^{[1
]}

Atkins, David C. ^{[2
]}

Narayanan, Shrikanth ^{[1
]}

机构：

[1] Univ Southern Calif, Signal Anal & Interpretat Lab, Los Angeles, CA USA

[2] Univ Washington, Dept Psychiat & Behav Sci, Seattle, WA 98195 USA

来源：

19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES | 2018年

关键词：

prosody; mutlimodal learning; behavioral signal processing;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we present an approach for predicting utterance level behaviors in psychotherapy sessions using both speech and lexical features. We train long short term memory (LSTM) networks with an attention mechanism using words, both manually and automatically transcribed, and prosodic features, at the word level, to predict the annotated behaviors. We demonstrate that prosodic features provide discriminative information relevant to the behavior task and show that they improve prediction when fused with automatically derived lexical features. Additionally, we investigate the weights of the attention mechanism to determine words and prosodic patterns which are of importance to the behavior prediction task.

引用

页码：3413 / 3417

页数：5

共 50 条

[1] Newborns discriminate utterance-level prosodic contours
Martinez-Alvarez, Anna
Benavides-Varela, Silvia
Lapillonne, Alexandre
Gervain, Judit
[J]. DEVELOPMENTAL SCIENCE, 2023, 26 (02)
[2] LEARNING UTTERANCE-LEVEL NORMALISATION USING VARIATIONAL AUTOENCODERS FOR ROBUST AUTOMATIC SPEECH RECOGNITION
Tan, Shawn
Sim, Khe Chai
[J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 43 - 49
[3] Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence
Luo, Xuan
Takamichi, Shinnosuke
Saito, Yuki
Koriyama, Tomoki
Saruwatari, Hiroshi
[J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2024, 13 (01)
[4] Learning Utterance-level Representations with Label Smoothing for Speech Emotion Recognition
Huang, Jian
Tao, Jianhua
Liu, Bin
Lian, Zheng
[J]. INTERSPEECH 2020, 2020, : 4079 - 4083
[5] Non-Contrastive Self-Supervised Learning for Utterance-Level Information Extraction From Speech
Cho, Jaejin
Villalba, Jesus
Moro-Velazquez, Laureano
Dehak, Najim
[J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1284 - 1295
[6] Non-Contrastive Self-Supervised Learning for Utterance-Level Information Extraction From Speech
Cho, Jaejin
Villalba, Jesús
Moro-Velazquez, Laureano
Dehak, Najim
[J]. IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1284 - 1295
[7] LEARNING UTTERANCE-LEVEL REPRESENTATIONS FOR SPEECH EMOTION AND AGE/GENDER RECOGNITION USING DEEP NEURAL NETWORKS
Wang, Zhong-Qiu
Tashev, Ivan
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5150 - 5154
[8] Prosodic word prediction using the lexical information
Dong, HH
Tao, JH
Xu, B
[J]. Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 189 - 193
[9] Using prosodic and lexical information for speaker identification
Weber, F
Manganaro, L
Peskin, B
Shriberg, E
[J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 141 - 144
[10] A Unimodal Representation Learning and Recurrent Decomposition Fusion Structure for Utterance-Level Multimodal Embedding Learning
Mai, Sijie
Hu, Haifeng
Xing, Songlong
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2488 - 2501

← 1 2 3 4 5 →