A Multimodal Variational Encoder-Decoder Framework for Micro-video Popularity Prediction

被引:29
|
作者
Xie, Jiayi [1 ]
Zhu, Yaochen [1 ]
Zhang, Zhibin [1 ]
Peng, Jian [1 ]
Yi, Jing [1 ]
Hu, Yaosi [1 ]
Liu, Hongyi [1 ]
Chen, Zhenzhong [1 ]
机构
[1] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Peoples R China
基金
国家重点研发计划;
关键词
Micro-video popularity prediction; Variational inference; Deep information bottleneck; Multimodal learning; Deep neural networks;
D O I
10.1145/3366423.3380004
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Predicting the popularity of a micro-video is a challenging task, due to a number of factors impacting the distribution such as the diversity of the video content and user interests, complex online interactions, etc. In this paper, we propose a multimodal variational encoder-decoder (MMVED) framework that considers the uncertain factors as the randomness for the mapping from the multimodal features to the popularity. Specifically, the MMVED first encodes features from multiple modalities in the observation space into latent representations and learns their probability distributions based on variational inference, where only relevant features in the input modalities can be extracted into the latent representations. Then, the modality-specific hidden representations are fused through Bayesian reasoning such that the complementary information from all modalities is well utilized. Finally, a temporal decoder implemented as a recurrent neural network is designed to predict the popularity sequence of a certain micro-video. Experiments conducted on a real-world dataset demonstrate the effectiveness of our proposed model in the micro-video popularity prediction task.
引用
收藏
页码:2542 / 2548
页数:7
相关论文
共 50 条
  • [1] Micro-Video Popularity Prediction Via Multimodal Variational Information Bottleneck
    Xie, Jiayi
    Zhu, Yaochen
    Chen, Zhenzhong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 24 - 37
  • [2] Variational Memory Encoder-Decoder
    Hung Le
    Truyen Tran
    Thin Nguyen
    Venkatesh, Svetha
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [3] Micro-Video Popularity Prediction with Bidirectional Deep Encoding Network
    Jing Peiguang
    Ye Xuqing
    Liu Yu
    Su Yuting
    LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (08)
  • [4] Pedestrian trajectory prediction using BiRNN encoder-decoder framework*
    Wu, Jiaxu
    Woo, Hanwool
    Tamura, Yusuke
    Moro, Alessandro
    Massaroli, Stefano
    Yamashita, Atsushi
    Asama, Hajime
    ADVANCED ROBOTICS, 2019, 33 (18) : 956 - 969
  • [5] Micro-climate Prediction - Multi Scale Encoder-decoder based Deep Learning Framework
    Kumar, Peeyush
    Chandra, Ranveer
    Bansal, Chetan
    Kalyanaraman, Shivkumar
    Ganu, Tanuja
    Grant, Michael
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 3128 - 3138
  • [6] Timber Tracing with Multimodal Encoder-Decoder Networks
    Zolotarev, Fedor
    Eerola, Tuomas
    Lensu, Lasse
    Kalviainen, Heikki
    Haario, Heikki
    Heikkinen, Jere
    Kauppi, Tomi
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, CAIP 2019, PT II, 2019, 11679 : 342 - 353
  • [7] A joint encoder-decoder error control framework for stereoscopic video coding
    Xiang, Xinguang
    Zhao, Debin
    Wang, Qiang
    Ma, Siwei
    Gao, Wen
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2010, 21 (08) : 975 - 985
  • [8] Multimodal cooperative learning for micro-video advertising click prediction
    Chen, Runyu
    INTERNET RESEARCH, 2022, 32 (02) : 477 - 495
  • [9] Encoder-Decoder Joint Enhancement for Video Chat
    Zhang, Zhenghao
    Wang, Zhao
    Ye, Yan
    Wang, Shiqi
    Zheng, Changwen
    2021 INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2021,
  • [10] Skip-attention encoder-decoder framework for human motion prediction
    Zhang, Ruipeng
    Shu, Xiangbo
    Yan, Rui
    Zhang, Jiachao
    Song, Yan
    MULTIMEDIA SYSTEMS, 2022, 28 (02) : 413 - 422