InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

被引:282
|
作者
Wang, Wenhai [1 ]
Dai, Jifeng [1 ,2 ]
Chen, Zhe [1 ,3 ]
Huang, Zhenhang [1 ]
Li, Zhiqi [1 ,3 ]
Zhu, Xizhou [4 ]
Hu, Xiaowei [1 ]
Lu, Tong [3 ]
Lu, Lewei [4 ]
Li, Hongsheng [5 ]
Wang, Xiaogang [4 ,5 ]
Qiao, Yu [1 ]
机构
[1] Shanghai AI Lab, Shanghai, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
[3] Nanjing Univ, Nanjing, Peoples R China
[4] SenseTime Res, Hong Kong, Peoples R China
[5] Chinese Univ Hong Kong, Hong Kong, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01385
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved a new record 65.4 mAP on COCO test-dev and 62.9 mIoU on ADE20K, outperforming current leading CNNs and ViTs.
引用
收藏
页码:14408 / 14419
页数:12
相关论文
共 50 条
  • [21] Large-Scale Mammography CAD with Deformable Conv-Nets
    Morrell, Stephen
    Wojna, Zbigniew
    Khoo, Can Son
    Ourselin, Sebastien
    Iglesias, Juan Eugenio
    IMAGE ANALYSIS FOR MOVING ORGAN, BREAST, AND THORACIC IMAGES, 2018, 11040 : 64 - 72
  • [22] Stoichiometric foundation of large-scale biochemical system analysis
    Beard, DA
    Qian, H
    Bassingthwaighte, JB
    MODELLING IN MOLECULAR BIOLOGY, 2004, : 1 - 19
  • [23] Centrifuge modeling of a large-scale surcharge on adjacent foundation
    Zhang, Jinzhang
    Ye, Zhenwei
    Zhang, Dongming
    Huang, Hongwei
    Han, Shijie
    Zou, Tong
    Zhang, Le
    JOURNAL OF ROCK MECHANICS AND GEOTECHNICAL ENGINEERING, 2024, 16 (08) : 3181 - 3191
  • [24] ChatGPT-like large-scale foundation models for prognostics and health management: A survey and roadmaps
    Li, Yan-Fu
    Wang, Huan
    Sun, Muxia
    Reliability Engineering and System Safety, 2024, 243
  • [25] ChatGPT-like large-scale foundation models for prognostics and health management: A survey and roadmaps*
    Li, Yan-Fu
    Wang, Huan
    Sun, Muxia
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2024, 243
  • [26] SCALE-SPACE TRACKING AND DEFORMABLE SHEET MODELS FOR COMPUTATIONAL VISION
    WHITTEN, G
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1993, 15 (07) : 697 - 706
  • [27] Exploring large-scale entanglement in quantum simulation
    Manoj K. Joshi
    Christian Kokail
    Rick van Bijnen
    Florian Kranzl
    Torsten V. Zache
    Rainer Blatt
    Christian F. Roos
    Peter Zoller
    Nature, 2023, 624 : 539 - 544
  • [28] Exploring Large-Scale Interactive Public Illustrations
    Thorn, Emily-Clare
    Rennick-Egglestone, Stefan
    Koleva, Boriana
    Preston, William
    Benford, Steve
    Quinn, Anthony
    Mortier, Richard
    DIS 2016: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON DESIGNING INTERACTIVE SYSTEMS, 2016, : 17 - 27
  • [29] MATHEMATICAL MODELS FOR LARGE-SCALE MILLS
    LOVEDAY, BK
    TOLMAY, AL
    BRITISH CHEMICAL ENGINEERING, 1971, 16 (2-3): : 229 - &
  • [30] Large-Scale Immune Models and Visualization
    Perrin, Dimitri
    Burns, John
    ERCIM NEWS, 2008, (74): : 35 - 36