InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions

被引:282
|
作者
Wang, Wenhai [1 ]
Dai, Jifeng [1 ,2 ]
Chen, Zhe [1 ,3 ]
Huang, Zhenhang [1 ]
Li, Zhiqi [1 ,3 ]
Zhu, Xizhou [4 ]
Hu, Xiaowei [1 ]
Lu, Tong [3 ]
Lu, Lewei [4 ]
Li, Hongsheng [5 ]
Wang, Xiaogang [4 ,5 ]
Qiao, Yu [1 ]
机构
[1] Shanghai AI Lab, Shanghai, Peoples R China
[2] Tsinghua Univ, Beijing, Peoples R China
[3] Nanjing Univ, Nanjing, Peoples R China
[4] SenseTime Res, Hong Kong, Peoples R China
[5] Chinese Univ Hong Kong, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR52729.2023.01385
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early state. This work presents a new large-scale CNN-based foundation model, termed InternImage, which can obtain the gain from increasing parameters and training data like ViTs. Different from the recent CNNs that focus on large dense kernels, InternImage takes deformable convolution as the core operator, so that our model not only has the large effective receptive field required for downstream tasks such as detection and segmentation, but also has the adaptive spatial aggregation conditioned by input and task information. As a result, the proposed InternImage reduces the strict inductive bias of traditional CNNs and makes it possible to learn stronger and more robust patterns with large-scale parameters from massive data like ViTs. The effectiveness of our model is proven on challenging benchmarks including ImageNet, COCO, and ADE20K. It is worth mentioning that InternImage-H achieved a new record 65.4 mAP on COCO test-dev and 62.9 mIoU on ADE20K, outperforming current leading CNNs and ViTs.
引用
收藏
页码:14408 / 14419
页数:12
相关论文
共 50 条
  • [41] REQUIEM FOR LARGE-SCALE COMPUTER MODELS
    LEE, DB
    EKISTICS-THE PROBLEMS AND SCIENCE OF HUMAN SETTLEMENTS, 1974, 37 (222): : 316 - 325
  • [42] LARGE-SCALE MODELS FOR POLICY DECISIONS
    HOUSE, PW
    PROCEEDINGS OF THE IEEE, 1975, 63 (03) : 511 - 518
  • [43] Exploring the Application of Large-Scale Pre-Trained Models on Adverse Weather Removal
    Tan, Zhentao
    Wu, Yue
    Liu, Qiankun
    Chu, Qi
    Lu, Le
    Ye, Jieping
    Yu, Nenghai
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 1683 - 1698
  • [44] Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models
    Wang, Boxin
    Ping, Wei
    Xiao, Chaowei
    Xu, Peng
    Patwary, Mostofa
    Shoeybi, Mohammad
    Li, Bo
    Anandkumar, Anima
    Catanzaro, Bryan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [45] Scalable Algorithms for Bayesian Inference of Large-Scale Models from Large-Scale Data
    Ghattas, Omar
    Isaac, Tobin
    Petra, Noemi
    Stadler, Georg
    HIGH PERFORMANCE COMPUTING FOR COMPUTATIONAL SCIENCE - VECPAR 2016, 2017, 10150 : 3 - 6
  • [46] Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models
    Pan, Fei
    Jeon, Sangryul
    Wang, Brian
    Mckenna, Frank
    Yu, Stella X.
    2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 8632 - 8641
  • [47] Large-scale outdoor scene reconstruction and correction with vision
    Tanner, Michael
    Pinies, Pedro
    Paz, Lina Maria
    Saftescu, Stefan
    Bewley, Alex
    Jonasson, Emil
    Newman, Paul
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2022, 41 (06): : 637 - 663
  • [48] Ion-assisted coating for large-scale Bimorph deformable mirror
    Mikami, Takuya
    Okamoto, Takayuki
    Yoshida, Kunio
    Jitsuno, Takahisa
    Motokoshi, Shinji
    Samarkin, Vadim V.
    Kudryashov, Alexis V.
    Kawanaka, Junji
    Miyanaga, Noriaki
    PACIFIC RIM LASER DAMAGE 2016 - OPTICAL MATERIALS FOR HIGH-POWER LASERS, 2016, 9983
  • [49] THE CONSTRUCTION, INSTRUMENTATION AND TESTING OF LARGE-SCALE PRESTRESSED CONCRETE BOX GIRDER MODELS OF DEFORMABLE CROSS-SECTION
    DANESI, RF
    EDWARDS, AD
    CIVIL ENGINEERING FOR PRACTICING AND DESIGN ENGINEERS, 1984, 3 (02): : 181 - 217
  • [50] PATTERN-FORMATION IN LARGE-SCALE MARANGONI CONVECTION WITH DEFORMABLE INTERFACE
    GOLOVIN, AA
    NEPOMNYASHCHY, AA
    PISMEN, LM
    PHYSICA D-NONLINEAR PHENOMENA, 1995, 81 (1-2) : 117 - 147