Multimodal Neural Language Models

被引:0
|
作者
Kiros, Ryan [1 ]
Salakhutdinov, Ruslan [1 ]
Zemel, Richard [1 ]
机构
[1] Univ Toronto, Dept Comp Sci, Canadian Inst Adv Res, Toronto, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. An imagetext multimodal neural language model can be used to retrieve images given complex sentence queries, retrieve phrase descriptions given image queries, as well as generate text conditioned on images We show that in the case of image-text modelling we can jointly learn word representations and image features by training our models together with a convolutional network. Unlike many of the existing methods, our approach can generate sentence descriptions for images without the use of templates, structured prediction, and/or syntactic trees. While we focus on imagetext modelling, our algorithms can be easily applied to other modalities such as audio.
引用
收藏
页码:595 / 603
页数:9
相关论文
共 50 条
  • [31] Multimodal Embeddings From Language Models for Emotion Recognition in the Wild
    Tseng, Shao-Yen
    Narayanan, Shrikanth
    Georgiou, Panayiotis
    [J]. IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 608 - 612
  • [32] Large Language and Multimodal Models Don't Come Cheap
    Anderson, Margo
    Perry, Tekla S.
    [J]. IEEE SPECTRUM, 2023, 60 (07) : 13 - 13
  • [33] Enhancing Urban Walkability Assessment with Multimodal Large Language Models
    Blecic, Ivan
    Saiu, Valeria
    Trunfio, Giuseppe A.
    [J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2024 WORKSHOPS, PT V, 2024, 14819 : 394 - 411
  • [34] Visual and Affective Multimodal Models of Word Meaning in Language and Mind
    De Deyne, Simon
    Navarro, Danielle J.
    Collell, Guillem
    Perfors, Andrew
    [J]. COGNITIVE SCIENCE, 2021, 45 (01)
  • [35] Mass-Producing Failures of Multimodal Systems with Language Models
    Tong, Shengbang
    Jones, Erik
    Steinhardt, Jacob
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [36] InteraRec: Interactive Recommendations Using Multimodal Large Language Models
    Karra, Saketh Reddy
    Tulabandhula, Theja
    [J]. TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2024 WORKSHOPS, RAFDA AND IWTA, 2024, 14658 : 32 - 43
  • [37] Multimodal Few-Shot Learning with Frozen Language Models
    Tsimpoukelli, Maria
    Menick, Jacob
    Cabi, Serkan
    Eslami, S. M. Ali
    Vinyals, Oriol
    Hill, Felix
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [38] Tracking Child Language Development With Neural Network Language Models
    Sagae, Kenji
    [J]. FRONTIERS IN PSYCHOLOGY, 2021, 12
  • [39] Quantifying Speech and Language Disturbance in Schizophrenia With Neural Language Models
    Cong, Yan
    Nikzad, Amir
    Pradhan, Sameer
    Cho, Sunghye
    Hansel, Katrin
    Mehta, Aarush
    Berretta, Sarah
    Behbehani, Leily
    Liberman, Mark
    Tang, Sunny
    [J]. NEUROPSYCHOPHARMACOLOGY, 2022, 47 (SUPPL 1) : 359 - 360
  • [40] Quantifying Speech and Language Disturbance in Schizophrenia With Neural Language Models
    Cong, Yan
    Nikzad, Amir
    Pradhan, Sameer
    Cho, Sunghye
    Hansel, Katrin
    Mehta, Aarush
    Berretta, Sarah
    Behbehani, Leily
    Liberman, Mark
    Tang, Sunny
    [J]. NEUROPSYCHOPHARMACOLOGY, 2022, 47 : 359 - 360