Frame-level nonverbal feature enhancement based sentiment analysis

被引:0
|
作者
Zheng, Cangzhi [1 ]
Peng, Junjie [1 ,2 ]
Wang, Lan [1 ]
Zhu, Li'an [1 ]
Guo, Jiatao [1 ]
Cai, Zesu [3 ]
机构
[1] Shanghai Univ, Sch Comp Engn & Sci, Shanghai, Peoples R China
[2] Shanghai Univ, Shanghai Inst Adv Commun & Data Sci, Shanghai, Peoples R China
[3] Harbin Inst Technol, Sch Comp Sci & Technol, Harbin, Peoples R China
关键词
Multimodal Sentiment Analysis; Frame-level Enhancement; Pre-trained Language Models; Vector Quantization;
D O I
10.1016/j.eswa.2024.125148
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal Sentiment Analysis (MSA) comprehensively utilizing data from multiple modalities to obtain more accurate sentiment attribute, has important applications in other fields, such as social media analysis, user experience evaluation and medical health, etc. It is worth noting that previous studies have paid little attention to the inconsistency of the initial representation granularity between verbal (textual) and nonverbal (acoustic and visual) modalities. As a result, the imbalanced emotional information between them complicates the interaction process, and ultimately affects the model's performance. To solve this problem, this paper proposes a Frame-level Nonverbal feature Enhancement Network (FNENet) to improve performance on MSA by reducing the gap and integrating asynchronous affective information between modalities. Specifically, Vector Quantization (VQ) is applied to nonverbal modalities to reduce the granularity differences and improve the performance of the model. Additionally, nonverbal information is integrated through the Sequence Fusion mechanism (SF) into a pre-trained language model to enhance the textual representation, which benefits the word-level semantic expression according to the asynchronous affective cues preserved in unaligned frame-level nonverbal features. Extensive experiments on three benchmark datasets demonstrate that FNENet significantly outperforms baseline methods. It indicates that our model has potential application on MSA.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] Frame-level speech enhancement based on Wasserstein GAN
    Peng, Chuan
    Lan, Tian
    Li, Meng
    Li, Sen
    Liu, Qiao
    [J]. ELEVENTH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING SYSTEMS, 2019, 11384
  • [2] Low Complexity Speech Enhancement Network Based on Frame-Level Swin Transformer
    Jiang, Weiqi
    Sun, Chengli
    Chen, Feilong
    Leng, Yan
    Guo, Qiaosheng
    Sun, Jiayi
    Peng, Jiankun
    [J]. ELECTRONICS, 2023, 12 (06)
  • [3] Segment-Level Feature and Frame-Level Feature Joint Learning for Emotional Speaker Recognition
    Liu, Jinlin
    Li, Dongdong
    Wang, Zhe
    Cai, Lizhi
    [J]. Computer Engineering and Applications, 2023, 59 (01) : 149 - 155
  • [4] Context-Based Adaptive Multimodal Fusion Network for Continuous Frame-Level Sentiment Prediction
    Huang, Maochun
    Qing, Chunmei
    Tan, Junpeng
    Xu, Xiangmin
    [J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 3468 - 3477
  • [5] A subspace approach for speech enhancement using frame-level AdaBoost classification
    Salman, A.
    Muhammad, E.
    Khurshid, K.
    [J]. 2007 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING, 2007, : 122 - 127
  • [6] Resformer: Local Frame-Level Feature and Global Segment-Level Feature Joint Learning for Speaker Verification
    Zi, Yunfei
    Xiong, Shengwu
    [J]. CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2024, 43 (7) : 4508 - 4527
  • [7] Frame-Level Stutter Detection
    Harvill, John
    Hasegawa-Johnson, Mark
    Yoo, Changdong
    [J]. INTERSPEECH 2022, 2022, : 2843 - 2847
  • [8] Frame-level Feature Tokenization Learning for Human Body Pose and Shape Estimation
    Cao, Hu
    Jia, Meining
    Wu, Suping
    [J]. 2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [9] Frame-level hidden Markov models
    Tran, D
    Wagner, M
    [J]. ADVANCES IN INTELLIGENT SYSTEMS: THEORY AND APPLICATIONS, 2000, 59 : 252 - 259
  • [10] Detecting video frame rate up-conversion based on frame-level analysis of average texture variation
    Xia, Min
    Yang, Gaobo
    Li, Leida
    Li, Ran
    Sun, Xingming
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (06) : 8399 - 8421