Using Prosodic and Lexical Information for Learning Utterance-level Behaviors in Psychotherapy

被引:0
|
作者
Singla, Karan [1 ]
Chen, Zhuohao [1 ]
Flemotomos, Nikolaos [1 ]
Gibson, James [1 ]
Can, Dogan [1 ]
Atkins, David C. [2 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ Southern Calif, Signal Anal & Interpretat Lab, Los Angeles, CA USA
[2] Univ Washington, Dept Psychiat & Behav Sci, Seattle, WA 98195 USA
关键词
prosody; mutlimodal learning; behavioral signal processing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present an approach for predicting utterance level behaviors in psychotherapy sessions using both speech and lexical features. We train long short term memory (LSTM) networks with an attention mechanism using words, both manually and automatically transcribed, and prosodic features, at the word level, to predict the annotated behaviors. We demonstrate that prosodic features provide discriminative information relevant to the behavior task and show that they improve prediction when fused with automatically derived lexical features. Additionally, we investigate the weights of the attention mechanism to determine words and prosodic patterns which are of importance to the behavior prediction task.
引用
收藏
页码:3413 / 3417
页数:5
相关论文
共 50 条
  • [1] Newborns discriminate utterance-level prosodic contours
    Martinez-Alvarez, Anna
    Benavides-Varela, Silvia
    Lapillonne, Alexandre
    Gervain, Judit
    [J]. DEVELOPMENTAL SCIENCE, 2023, 26 (02)
  • [2] LEARNING UTTERANCE-LEVEL NORMALISATION USING VARIATIONAL AUTOENCODERS FOR ROBUST AUTOMATIC SPEECH RECOGNITION
    Tan, Shawn
    Sim, Khe Chai
    [J]. 2016 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2016), 2016, : 43 - 49
  • [3] Emotion-controllable Speech Synthesis Using Emotion Soft Label, Utterance-level Prosodic Factors, and Word-level Prominence
    Luo, Xuan
    Takamichi, Shinnosuke
    Saito, Yuki
    Koriyama, Tomoki
    Saruwatari, Hiroshi
    [J]. APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2024, 13 (01)
  • [4] Learning Utterance-level Representations with Label Smoothing for Speech Emotion Recognition
    Huang, Jian
    Tao, Jianhua
    Liu, Bin
    Lian, Zheng
    [J]. INTERSPEECH 2020, 2020, : 4079 - 4083
  • [5] Non-Contrastive Self-Supervised Learning for Utterance-Level Information Extraction From Speech
    Cho, Jaejin
    Villalba, Jesus
    Moro-Velazquez, Laureano
    Dehak, Najim
    [J]. IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (06) : 1284 - 1295
  • [6] Non-Contrastive Self-Supervised Learning for Utterance-Level Information Extraction From Speech
    Cho, Jaejin
    Villalba, Jesús
    Moro-Velazquez, Laureano
    Dehak, Najim
    [J]. IEEE Journal on Selected Topics in Signal Processing, 2022, 16 (06): : 1284 - 1295
  • [7] LEARNING UTTERANCE-LEVEL REPRESENTATIONS FOR SPEECH EMOTION AND AGE/GENDER RECOGNITION USING DEEP NEURAL NETWORKS
    Wang, Zhong-Qiu
    Tashev, Ivan
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5150 - 5154
  • [8] Prosodic word prediction using the lexical information
    Dong, HH
    Tao, JH
    Xu, B
    [J]. Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering (IEEE NLP-KE'05), 2005, : 189 - 193
  • [9] Using prosodic and lexical information for speaker identification
    Weber, F
    Manganaro, L
    Peskin, B
    Shriberg, E
    [J]. 2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 141 - 144
  • [10] A Unimodal Representation Learning and Recurrent Decomposition Fusion Structure for Utterance-Level Multimodal Embedding Learning
    Mai, Sijie
    Hu, Haifeng
    Xing, Songlong
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 : 2488 - 2501