Multimodal Neural Language Models

被引:0
|
作者
Kiros, Ryan [1 ]
Salakhutdinov, Ruslan [1 ]
Zemel, Richard [1 ]
机构
[1] Univ Toronto, Dept Comp Sci, Canadian Inst Adv Res, Toronto, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce two multimodal neural language models: models of natural language that can be conditioned on other modalities. An imagetext multimodal neural language model can be used to retrieve images given complex sentence queries, retrieve phrase descriptions given image queries, as well as generate text conditioned on images We show that in the case of image-text modelling we can jointly learn word representations and image features by training our models together with a convolutional network. Unlike many of the existing methods, our approach can generate sentence descriptions for images without the use of templates, structured prediction, and/or syntactic trees. While we focus on imagetext modelling, our algorithms can be easily applied to other modalities such as audio.
引用
收藏
页码:595 / 603
页数:9
相关论文
共 50 条
  • [1] Neural language models for the multilingual, transcultural, and multimodal Semantic Web
    Gromann, Dagmar
    [J]. SEMANTIC WEB, 2020, 11 (01) : 29 - 39
  • [2] Generating Images with Multimodal Language Models
    Koh, Jing Yu
    Fried, Daniel
    Salakhutdinov, Ruslan
    [J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Multimodal prediction of student performance: A fusion of signed graph neural networks and large language models
    Wang, Sijie
    Ni, Lin
    Zhang, Zeyu
    Li, Xiaoxuan
    Zheng, Xianda
    Liu, Jiamou
    [J]. PATTERN RECOGNITION LETTERS, 2024, 181 : 1 - 8
  • [4] The application of multimodal large language models in medicine
    Qiu, Jianing
    Yuan, Wu
    Lam, Kyle
    [J]. LANCET REGIONAL HEALTH-WESTERN PACIFIC, 2024, 45
  • [5] Finetuning Language Models for Multimodal Question Answering
    Zhang, Xin
    Xie, Wen
    Dai, Ziqi
    Rao, Jun
    Wen, Haokun
    Luo, Xuan
    Zhang, Meishan
    Zhang, Min
    [J]. PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9420 - 9424
  • [6] Multimodal large language models for bioimage analysis
    Zhang, Shanghang
    Dai, Gaole
    Huang, Tiejun
    Chen, Jianxu
    [J]. NATURE METHODS, 2024, 21 (08) : 1390 - 1393
  • [7] Large language models and multimodal foundation models for precision oncology
    Truhn, Daniel
    Eckardt, Jan-Niklas
    Ferber, Dyke
    Kather, Jakob Nikolas
    [J]. NPJ PRECISION ONCOLOGY, 2024, 8 (01)
  • [8] Large language models and multimodal foundation models for precision oncology
    Daniel Truhn
    Jan-Niklas Eckardt
    Dyke Ferber
    Jakob Nikolas Kather
    [J]. npj Precision Oncology, 8
  • [9] PARAPHRASTIC LANGUAGE MODELS AND COMBINATION WITH NEURAL NETWORK LANGUAGE MODELS
    Liu, X.
    Gales, M. J. F.
    Woodland, P. C.
    [J]. 2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8421 - 8425
  • [10] Using Augmented Small Multimodal Models to Guide Large Language Models for Multimodal Relation Extraction
    He, Wentao
    Ma, Hanjie
    Li, Shaohua
    Dong, Hui
    Zhang, Haixiang
    Feng, Jie
    [J]. APPLIED SCIENCES-BASEL, 2023, 13 (22):