A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction

被引：31

作者：

Xie, Jiayi ^{[1
]}

Zhu, Yaochen ^{[1
]}

Zhang, Zhibin ^{[1
]}

Peng, Jian ^{[1
]}

Yi, Jing ^{[1
]}

Hu, Yaosi ^{[1
]}

Liu, Hongyi ^{[1
]}

Chen, Zhenzhong ^{[1
]}

机构：

[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China

来源：

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) | 2020年

基金：

国家重点研发计划;

关键词：

Micro-video popularity prediction; Variational inference; Deep information bottleneck; Multimodal learning; Deep neural networks;

D O I：

10.1145/3366423.3380004

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Predicting the popularity of a micro-video is a challenging task, due to a number of factors impacting the distribution such as the diversity of the video content and user interests, complex online interactions, etc. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework that considers the uncertain factors as the randomness for the mapping from the multimodal features to the popularity. Specifically, the MMVED first encodes features from multiple modalities in the observation space into latent representations and learns their probability distributions based on variational inference, where only relevant features in the input modalities can be extracted into the latent representations. Then, the modality-specific hidden representations are fused through Bayesian reasoning such that the complementary information from all modalities is well utilized. Finally, a temporal decoder implemented as a recurrent neural network is designed to predict the popularity sequence of a certain micro-video. Experiments conducted on a real-world dataset demonstrate the effectiveness of our proposed model in the micro-video popularity prediction task.

引用

页码：2542 / 2548

页数：7

共 50 条

[41] Appraisal of Resistivity Inversion Models With Convolutional Variational Encoder-Decoder Network
Wilson, Bibin
Singh, Anand
Sethi, Amit
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
[42] Multimodal super-resolution reconstruction based on encoder-decoder network
Wang, Bowen
Zou, Yan
Wang, Minqi
OPTICS, PHOTONICS AND DIGITAL TECHNOLOGIES FOR IMAGING APPLICATIONS VII, 2022, 12138
[43] An encoder-decoder based framework for hindi image caption generation
Singh, Alok
Singh, Thoudam Doren
Bandyopadhyay, Sivaji
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35721 - 35740
[44] Machine translation of cortical activity to text with an encoder-decoder framework
Makin, Joseph G.
Moses, David A.
Chang, Edward F.
NATURE NEUROSCIENCE, 2020, 23 (04) : 575 - +
[45] An encoder-decoder based framework for hindi image caption generation
Alok Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
Multimedia Tools and Applications, 2021, 80 : 35721 - 35740
[46] Attention Aggregation Encoder-Decoder Network Framework for Stereo Matching
Zhang, Yaru
Li, Yaqian
Kong, Yating
Liu, Bin
IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 760 - 764
[47] An Encoder-Decoder Framework Translating Natural Language to Database Queries
Cai, Ruichu
Xu, Boyan
Zhang, Zhenjie
Yang, Xiaoyan
Li, Zijian
Liang, Zhihao
PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3977 - 3983
[48] Natural Scene Text Recognition Based on Encoder-Decoder Framework
Zuo, Ling-Qun
Sun, Hong-Mei
Mao, Qi-Chao
Qi, Rong
Jia, Rui-Sheng
IEEE ACCESS, 2019, 7 : 62616 - 62623
[49] Pedestrian behavior prediction model with a convolutional LSTM encoder-decoder
Chen, Kai
Song, Xiao
Han, Daolin
Sun, Jinghan
Cui, Yong
Ren, Xiaoxiang
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 560 (560)
[50] A multitask encoder-decoder model for quality prediction in injection moulding
Muaz, Muhammad
Yu, Hanxin
Sung, Wai Lam
Liu, Chang
Drescher, Benny
JOURNAL OF MANUFACTURING PROCESSES, 2023, 103 : 238 - 247

← 1 2 3 4 5 →