A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction

被引:31
|
作者
Xie, Jiayi [1 ]
Zhu, Yaochen [1 ]
Zhang, Zhibin [1 ]
Peng, Jian [1 ]
Yi, Jing [1 ]
Hu, Yaosi [1 ]
Liu, Hongyi [1 ]
Chen, Zhenzhong [1 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China
基金
国家重点研发计划;
关键词
Micro-video popularity prediction; Variational inference; Deep information bottleneck; Multimodal learning; Deep neural networks;
D O I
10.1145/3366423.3380004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Predicting the popularity of a micro-video is a challenging task, due to a number of factors impacting the distribution such as the diversity of the video content and user interests, complex online interactions, etc. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework that considers the uncertain factors as the randomness for the mapping from the multimodal features to the popularity. Specifically, the MMVED first encodes features from multiple modalities in the observation space into latent representations and learns their probability distributions based on variational inference, where only relevant features in the input modalities can be extracted into the latent representations. Then, the modality-specific hidden representations are fused through Bayesian reasoning such that the complementary information from all modalities is well utilized. Finally, a temporal decoder implemented as a recurrent neural network is designed to predict the popularity sequence of a certain micro-video. Experiments conducted on a real-world dataset demonstrate the effectiveness of our proposed model in the micro-video popularity prediction task.
引用
收藏
页码:2542 / 2548
页数:7
相关论文
共 50 条
  • [41] Appraisal of Resistivity Inversion Models With Convolutional Variational Encoder-Decoder Network
    Wilson, Bibin
    Singh, Anand
    Sethi, Amit
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [42] Multimodal super-resolution reconstruction based on encoder-decoder network
    Wang, Bowen
    Zou, Yan
    Wang, Minqi
    OPTICS, PHOTONICS AND DIGITAL TECHNOLOGIES FOR IMAGING APPLICATIONS VII, 2022, 12138
  • [43] An encoder-decoder based framework for hindi image caption generation
    Singh, Alok
    Singh, Thoudam Doren
    Bandyopadhyay, Sivaji
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35721 - 35740
  • [44] Machine translation of cortical activity to text with an encoder-decoder framework
    Makin, Joseph G.
    Moses, David A.
    Chang, Edward F.
    NATURE NEUROSCIENCE, 2020, 23 (04) : 575 - +
  • [45] An encoder-decoder based framework for hindi image caption generation
    Alok Singh
    Thoudam Doren Singh
    Sivaji Bandyopadhyay
    Multimedia Tools and Applications, 2021, 80 : 35721 - 35740
  • [46] Attention Aggregation Encoder-Decoder Network Framework for Stereo Matching
    Zhang, Yaru
    Li, Yaqian
    Kong, Yating
    Liu, Bin
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 : 760 - 764
  • [47] An Encoder-Decoder Framework Translating Natural Language to Database Queries
    Cai, Ruichu
    Xu, Boyan
    Zhang, Zhenjie
    Yang, Xiaoyan
    Li, Zijian
    Liang, Zhihao
    PROCEEDINGS OF THE TWENTY-SEVENTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2018, : 3977 - 3983
  • [48] Natural Scene Text Recognition Based on Encoder-Decoder Framework
    Zuo, Ling-Qun
    Sun, Hong-Mei
    Mao, Qi-Chao
    Qi, Rong
    Jia, Rui-Sheng
    IEEE ACCESS, 2019, 7 : 62616 - 62623
  • [49] Pedestrian behavior prediction model with a convolutional LSTM encoder-decoder
    Chen, Kai
    Song, Xiao
    Han, Daolin
    Sun, Jinghan
    Cui, Yong
    Ren, Xiaoxiang
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 560 (560)
  • [50] A multitask encoder-decoder model for quality prediction in injection moulding
    Muaz, Muhammad
    Yu, Hanxin
    Sung, Wai Lam
    Liu, Chang
    Drescher, Benny
    JOURNAL OF MANUFACTURING PROCESSES, 2023, 103 : 238 - 247