A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction

被引:31
|
作者
Xie, Jiayi [1 ]
Zhu, Yaochen [1 ]
Zhang, Zhibin [1 ]
Peng, Jian [1 ]
Yi, Jing [1 ]
Hu, Yaosi [1 ]
Liu, Hongyi [1 ]
Chen, Zhenzhong [1 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China
基金
国家重点研发计划;
关键词
Micro-video popularity prediction; Variational inference; Deep information bottleneck; Multimodal learning; Deep neural networks;
D O I
10.1145/3366423.3380004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Predicting the popularity of a micro-video is a challenging task, due to a number of factors impacting the distribution such as the diversity of the video content and user interests, complex online interactions, etc. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework that considers the uncertain factors as the randomness for the mapping from the multimodal features to the popularity. Specifically, the MMVED first encodes features from multiple modalities in the observation space into latent representations and learns their probability distributions based on variational inference, where only relevant features in the input modalities can be extracted into the latent representations. Then, the modality-specific hidden representations are fused through Bayesian reasoning such that the complementary information from all modalities is well utilized. Finally, a temporal decoder implemented as a recurrent neural network is designed to predict the popularity sequence of a certain micro-video. Experiments conducted on a real-world dataset demonstrate the effectiveness of our proposed model in the micro-video popularity prediction task.
引用
收藏
页码:2542 / 2548
页数:7
相关论文
共 50 条
  • [21] SPEECH-TO-SINGING CONVERSION IN AN ENCODER-DECODER FRAMEWORK
    Parekh, Jayneel
    Rao, Preeti
    Yang, Yi-Hsuan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 261 - 265
  • [22] Multimodal Learning toward Micro-Video Understanding
    Nie L.
    Liu M.
    Song X.
    Synthesis Lectures on Image, Video, and Multimedia Processing, 2019, 9 (04): : 1 - 186
  • [23] Into the Unobservables: A Multi-range Encoder-decoder Framework for COVID-19 Prediction
    Cui, Yue
    Zhu, Chen
    Ye, Guanyu
    Wang, Ziwei
    Zheng, Kai
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 292 - 301
  • [24] Using LSTM encoder-decoder for rhetorical structure prediction
    de Moura, Gustavo Bennemann
    Feltrim, Valeria Delisandra
    2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 278 - 283
  • [25] An Encoder-Decoder Architecture for the Prediction of Web Service QoS
    Smahi, Mohammed Ismail
    Hadjila, Fethellah
    Tibermacine, Chouki
    Merzoug, Mohammed
    Benamar, Abdelkrim
    SERVICE-ORIENTED AND CLOUD COMPUTING (ESOCC 2018), 2018, 11116 : 74 - 89
  • [26] Pavement Roughness Prediction Based on Encoder-decoder Structure
    Guo R.
    Yu X.
    Tongji Daxue Xuebao/Journal of Tongji University, 2023, 51 (08): : 1182 - 1190
  • [27] Contextual encoder-decoder network for visual saliency prediction
    Kroner, Alexander
    Senden, Mario
    Driessens, Kurt
    Goebel, Rainer
    NEURAL NETWORKS, 2020, 129 : 261 - 270
  • [28] Unsupervised Encoder-Decoder Model for Anomaly Prediction Task
    Wu, Jinmeng
    Shu, Pengcheng
    Hong, Hanyu
    Li, Xingxun
    Ma, Lei
    Zhang, Yaozong
    Zhu, Ying
    Wang, Lei
    MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 549 - 561
  • [29] CEDNet: A cascade encoder-decoder network for dense prediction
    Zhang, Gang
    Li, Ziyi
    Tang, Chufeng
    Li, Jianmin
    Hu, Xiaolin
    PATTERN RECOGNITION, 2025, 158
  • [30] Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning
    Chen, Jingwen
    Pan, Yingwei
    Li, Yehao
    Yao, Ting
    Chao, Hongyang
    Mei, Tao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)