Representation Learning Through Cross-Modality Supervision

被引:0
|
作者
Sankaran, Nishant [1 ]
Mohan, Deen Dayal [1 ]
Setlur, Srirangaraj [1 ]
Govindaraju, Venugopal [1 ]
Fedorishin, Dennis [1 ]
机构
[1] Univ Buffalo, Dept Comp Sci & Engn, Buffalo, NY 14260 USA
关键词
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Learning robust representations for applications with multiple modalities of input can have a significant impact on its performance. Traditional representation learning methods rely on projecting the input modalities to a common subspace to maximize agreement amongst the modalities for a particular task. We propose a novel approach to representation learning that uses a latent representation decoder to reconstruct the target modality and thereby employs the target modality purely as a supervision signal for discovering correlations between the modalities. Through cross modality supervision, we demonstrate that the learnt representation is able to improve the performance of the task of facial action unit ( AU) recognition when compared with the modality specific representations and even their fused counterparts. Our experiments on three AU recognition datasets - MMSE, BP4D and DISFA, show strong performance gains producing state-of-the-art results in spite of the absence of a modality.
引用
收藏
页码:107 / 114
页数:8
相关论文
共 50 条
  • [1] Representation Learning for Cross-Modality Classification
    van Tulder, Gijs
    de Bruijne, Marleen
    [J]. MEDICAL COMPUTER VISION AND BAYESIAN AND GRAPHICAL MODELS FOR BIOMEDICAL IMAGING, 2017, 10081 : 126 - 136
  • [2] Cross-modality representation learning from transformer for hashtag prediction
    Mian Muhammad Yasir Khalil
    Qingxian Wang
    Bo Chen
    Weidong Wang
    [J]. Journal of Big Data, 10
  • [3] Cross-modality Representation Interactive Learning For Multimodal Sentiment Analysis
    Huang, Jian
    Ji, Yanli
    Yang, Yang
    Shen, Heng Tao
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 426 - 434
  • [4] Cross-modality representation learning from transformer for hashtag prediction
    Khalil, Mian Muhammad Yasir
    Wang, Qingxian
    Chen, Bo
    Wang, Weidong
    [J]. JOURNAL OF BIG DATA, 2023, 10 (01)
  • [5] S2-Net: Self-Supervision Guided Feature Representation Learning for Cross-Modality Images
    Shasha Mei
    Yong Ma
    Xiaoguang Mei
    Jun Huang
    Fan Fan
    [J]. IEEE/CAA Journal of Automatica Sinica, 2022, (10) : 1883 - 1885
  • [6] S2-Net: Self-Supervision Guided Feature Representation Learning for Cross-Modality Images
    Mei, Shasha
    Ma, Yong
    Mei, Xiaoguang
    Huang, Jun
    Fan, Fan
    [J]. IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (10) : 1883 - 1885
  • [7] Anatomy-Regularized Representation Learning for Cross-Modality Medical Image Segmentation
    Chen, Xu
    Lian, Chunfeng
    Wang, Li
    Deng, Hannah
    Kuang, Tianshu
    Fung, Steve
    Gateno, Jaime
    Yap, Pew-Thian
    Xia, James J.
    Shen, Dinggang
    [J]. IEEE TRANSACTIONS ON MEDICAL IMAGING, 2021, 40 (01) : 274 - 285
  • [8] Liver Segmentation via Learning Cross-Modality Content-Aware Representation
    Lin, Xingxiao
    Ji, Zexuan
    [J]. PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT XIII, 2024, 14437 : 198 - 208
  • [9] Robust video question answering via contrastive cross-modality representation learning
    Yang, Xun
    Zeng, Jianming
    Guo, Dan
    Wang, Shanshan
    Dong, Jianfeng
    Wang, Meng
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (10)
  • [10] Robust video question answering via contrastive cross-modality representation learning
    Xun YANG
    Jianming ZENG
    Dan GUO
    Shanshan WANG
    Jianfeng DONG
    Meng WANG
    [J]. Science China(Information Sciences)., 2024, 67 (10) - 226