Adaptive Transformers for Learning Multimodal Representations

被引:0
|
作者
Bhargava, Prajjwal
机构
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The usage of transformers has grown from learning about language semantics to forming meaningful visiolinguistic representations. These architectures are often over-parametrized, requiring large amounts of computation. In this work, we extend adaptive approaches to learn more about model interpretability and computational efficiency. Specifically, we study attention spans, sparse, and structured dropout methods to help understand how their attention mechanism extends for vision and language tasks. We further show that these approaches can help us learn more about how the network perceives the complexity of input sequences, sparsity preferences for different modalities, and other related phenomena.
引用
下载
收藏
页码:1 / 7
页数:7
相关论文
共 50 条
  • [31] Learning Fused Representations for Large-Scale Multimodal Classification
    Nawaz, Shah
    Calefati, Alessandro
    Janjua, Muhammad Kamran
    Anwaar, Muhammad Umer
    Gallo, Ignazio
    IEEE SENSORS LETTERS, 2019, 3 (01)
  • [32] Graph to Grid: Learning Deep Representations for Multimodal Emotion Recognition
    Jin, Ming
    Li, Jinpeng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5985 - 5993
  • [33] Learning Self-Supervised Multimodal Representations of Human Behaviour
    Shukla, Abhinav
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 4748 - 4751
  • [34] Learning Abstract Representations Through Lossy Compression of Multimodal Signals
    Wilmot, Charles
    Baldassarre, Gianluca
    Triesch, Jochen
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2023, 15 (02) : 348 - 360
  • [35] Adaptive negative representations for graph contrastive learning
    Zhang, Qi
    Yang, Cheng
    Shi, Chuan
    AI OPEN, 2024, 5 : 79 - 86
  • [36] LEARNING SPARSE REPRESENTATIONS FOR ADAPTIVE COMPRESSIVE SENSING
    Soni, Akshay
    Haupt, Jarvis
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 2097 - 2100
  • [37] Learning Adaptive and Sparse Representations of Medical Images
    Stagliano, Alessandra
    Chiusano, Gabriele
    Basso, Curzio
    Santoro, Matteo
    MEDICAL COMPUTER VISION: RECOGNITION TECHNIQUES AND APPLICATIONS IN MEDICAL IMAGING, 2011, 6533 : 130 - 140
  • [38] DETECTING EXPRESSIONS WITH MULTIMODAL TRANSFORMERS
    Parthasarathy, Srinivas
    Sundaram, Shiva
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 636 - 643
  • [39] BERTERS: Multimodal representation learning for expert recommendation system with transformers and graph embeddings
    Nikzad-Khasmakhi, N.
    Balafar, M. A.
    Feizi-Derakhshi, M. Reza
    Motamed, Cina
    CHAOS SOLITONS & FRACTALS, 2021, 151
  • [40] Word Representation Learning in Multimodal Pre-Trained Transformers: An Intrinsic Evaluation
    Pezzelle, Sandro
    Takmaz, Ece
    Fernandez, Raquel
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 1563 - 1579