A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction

被引：31

作者：

Xie, Jiayi ^{[1
]}

Zhu, Yaochen ^{[1
]}

Zhang, Zhibin ^{[1
]}

Peng, Jian ^{[1
]}

Yi, Jing ^{[1
]}

Hu, Yaosi ^{[1
]}

Liu, Hongyi ^{[1
]}

Chen, Zhenzhong ^{[1
]}

机构：

[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China

来源：

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) | 2020年

基金：

国家重点研发计划;

关键词：

Micro-video popularity prediction; Variational inference; Deep information bottleneck; Multimodal learning; Deep neural networks;

D O I：

10.1145/3366423.3380004

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Predicting the popularity of a micro-video is a challenging task, due to a number of factors impacting the distribution such as the diversity of the video content and user interests, complex online interactions, etc. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework that considers the uncertain factors as the randomness for the mapping from the multimodal features to the popularity. Specifically, the MMVED first encodes features from multiple modalities in the observation space into latent representations and learns their probability distributions based on variational inference, where only relevant features in the input modalities can be extracted into the latent representations. Then, the modality-specific hidden representations are fused through Bayesian reasoning such that the complementary information from all modalities is well utilized. Finally, a temporal decoder implemented as a recurrent neural network is designed to predict the popularity sequence of a certain micro-video. Experiments conducted on a real-world dataset demonstrate the effectiveness of our proposed model in the micro-video popularity prediction task.

引用

页码：2542 / 2548

页数：7

共 50 条

[21] SPEECH-TO-SINGING CONVERSION IN AN ENCODER-DECODER FRAMEWORK
Parekh, Jayneel
Rao, Preeti
Yang, Yi-Hsuan
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 261 - 265
[22] Multimodal Learning toward Micro-Video Understanding
Nie L.
Liu M.
Song X.
Synthesis Lectures on Image, Video, and Multimedia Processing, 2019, 9 (04): : 1 - 186
[23] Into the Unobservables: A Multi-range Encoder-decoder Framework for COVID-19 Prediction
Cui, Yue
Zhu, Chen
Ye, Guanyu
Wang, Ziwei
Zheng, Kai
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 292 - 301
[24] Using LSTM encoder-decoder for rhetorical structure prediction
de Moura, Gustavo Bennemann
Feltrim, Valeria Delisandra
2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 278 - 283
[25] An Encoder-Decoder Architecture for the Prediction of Web Service QoS
Smahi, Mohammed Ismail
Hadjila, Fethellah
Tibermacine, Chouki
Merzoug, Mohammed
Benamar, Abdelkrim
SERVICE-ORIENTED AND CLOUD COMPUTING (ESOCC 2018), 2018, 11116 : 74 - 89
[26] Pavement Roughness Prediction Based on Encoder-decoder Structure
Guo R.
Yu X.
Tongji Daxue Xuebao/Journal of Tongji University, 2023, 51 (08): : 1182 - 1190
[27] Contextual encoder-decoder network for visual saliency prediction
Kroner, Alexander
Senden, Mario
Driessens, Kurt
Goebel, Rainer
NEURAL NETWORKS, 2020, 129 : 261 - 270
[28] Unsupervised Encoder-Decoder Model for Anomaly Prediction Task
Wu, Jinmeng
Shu, Pengcheng
Hong, Hanyu
Li, Xingxun
Ma, Lei
Zhang, Yaozong
Zhu, Ying
Wang, Lei
MULTIMEDIA MODELING, MMM 2023, PT II, 2023, 13834 : 549 - 561
[29] CEDNet: A cascade encoder-decoder network for dense prediction
Zhang, Gang
Li, Ziyi
Tang, Chufeng
Li, Jianmin
Hu, Xiaolin
PATTERN RECOGNITION, 2025, 158
[30] Retrieval Augmented Convolutional Encoder-decoder Networks for Video Captioning
Chen, Jingwen
Pan, Yingwei
Li, Yehao
Yao, Ting
Chao, Hongyang
Mei, Tao
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (01)

← 1 2 3 4 5 →