On Scaling up a Multilingual Vision and Language Model

被引:1
|
作者
Chen, Xi [1 ]
Djolonga, Josip [1 ]
Padlewski, Piotr [1 ]
Mustafa, Basil [1 ]
Changpinyo, Soravit [1 ]
Wu, Jialin [1 ]
Ruiz, Carlos Riquelme [1 ]
Goodman, Sebastian [1 ]
Wang, Xiao [1 ]
Tay, Yi [1 ]
Shakeri, Siamak [1 ]
Dehghani, Mostafa [1 ]
Salz, Daniel [1 ]
Lucic, Mario [1 ]
Tschannen, Michael [1 ]
Nagrani, Arsha [1 ]
Hu, Hexiang [1 ]
Joshi, Mandar [1 ]
Pang, Bo [1 ]
Montgomery, Ceslee [1 ]
Pietrzyk, Paulina [1 ]
Ritter, Marvin [1 ]
Piergiovanni, A. J. [1 ]
Minderer, Matthias [1 ]
Pavetic, Filip [1 ]
Waters, Austin [1 ]
Li, Gang [1 ]
Alabdulmohsin, Ibrahim [1 ]
Beyer, Lucas [1 ]
Amelot, Julien [1 ]
Lee, Kenton [1 ]
Steiner, Andreas Peter [1 ]
Li, Yang [1 ]
Keysers, Daniel [1 ]
Arnab, Anurag [1 ]
Xu, Yuanzhong [1 ]
Rong, Keran [1 ]
Kolesnikov, Alexander [1 ]
Seyedhosseini, Mojtaba [1 ]
Angelova, Anelia [1 ]
Zhai, Xiaohua [1 ]
Houlsby, Neil [1 ]
Soricut, Radu [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
关键词
D O I
10.1109/CVPR52733.2024.01368
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We explore the boundaries of scaling up a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-shot (in-context) learning, as well as object detection, video question answering, and video captioning. Our model advances the state-of-the-art on most vision-and-language benchmarks considered (20+ of them). Finally, we observe emerging capabilities, such as complex counting and multilingual object detection, tasks that are not explicitly in the training mix.
引用
收藏
页码:14432 / 14444
页数:13
相关论文
共 50 条
  • [31] Development of a Multilingual Model for Machine Sentiment Analysis in the Serbian Language
    Draskovic, Drazen
    Zecevic, Darinka
    Nikolic, Bosko
    MATHEMATICS, 2022, 10 (18)
  • [32] Fine-grained Language Identification with Multilingual CapsNet Model
    Verma, Mudit
    Buduru, Arun Balaji
    2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON MULTIMEDIA BIG DATA (BIGMM 2020), 2020, : 94 - 102
  • [33] A 43 Language Multilingual Punctuation Prediction Neural Network Model
    Li, Xinxing
    Lin, Edward
    INTERSPEECH 2020, 2020, : 1067 - 1071
  • [34] Soft Language Clustering for Multilingual Model Pre-training
    Zeng, Jiali
    Jiang, Yufan
    Yin, Yongjing
    Jing, Yi
    Meng, Fandong
    Lin, Binghuai
    Cao, Yunbo
    Zhou, Jie
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7021 - 7035
  • [35] Wiki-40B: Multilingual Language Model Dataset
    Guo, Mandy
    Dai, Zihang
    Vrandecic, Denny
    Al-Rfou, Rami
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 2440 - 2452
  • [36] Efficient Multilingual Language Model Compression through Vocabulary Trimming
    Ushio, Asahi
    Zhou, Yi
    Camacho-Collados, Jose
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14725 - 14739
  • [37] A case based reasoning model for multilingual language generation in dialogues
    Lopez Salazar, Victor
    Eisman Cabeza, Eduardo M.
    Castro Pena, Juan Luis
    Zurita Lopez, Jose Manuel
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) : 7330 - 7337
  • [38] Multilingual news extraction via stopword language model scoring
    Yu-Chieh Wu
    Journal of Intelligent Information Systems, 2017, 48 : 191 - 213
  • [39] Using a language independent domain model for multilingual information extraction
    Azzam, S
    Humphreys, K
    Gaizauskas, R
    Wilks, Y
    APPLIED ARTIFICIAL INTELLIGENCE, 1999, 13 (07) : 705 - 724
  • [40] A Text-to-Text Model for Multilingual Offensive Language Identification
    Ranasinghe, Tharindu
    Zampieri, Marcos
    13TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING AND THE 3RD CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, IJCNLP-AACL 2023, 2023, : 375 - 384