Multi-Modal Multi-Scale Deep Learning for Large-Scale Image Annotation

被引:53
|
作者
Niu, Yulei [1 ]
Lu, Zhiwu [1 ]
Wen, Ji-Rong [1 ]
Xiang, Tao [2 ,3 ]
Chang, Shih-Fu [4 ]
机构
[1] Renmin Univ China, Beijing Key Lab Big Data Management & Anal Method, Sch Informat, Beijing 100872, Peoples R China
[2] Queen Mary Univ London, Sch Elect Engn & Comp Sci, London E1 4NS, England
[3] Samsung AI Ctr, Cambridge CB1 2JB, England
[4] Columbia Univ, Dept Elect Engn, New York, NY 10027 USA
基金
中国国家自然科学基金;
关键词
Large-scale image annotation; multi-scale deep model; multi-modal deep model; label quantity prediction;
D O I
10.1109/TIP.2018.2881928
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image annotation aims to annotate a given image with a variable number of class labels corresponding to diverse visual concepts. In this paper, we address two main issues in large-scale image annotation: 1) how to learn a rich feature representation suitable for predicting a diverse set of visual concepts ranging from object, scene to abstract concept and 2) how to annotate an image with the optimal number of class labels. To address the first issue, we propose a novel multi-scale deep model for extracting rich and discriminative features capable of representing a wide range of visual concepts. Specifically, a novel two-branch deep neural network architecture is proposed, which comprises a very deep main network branch and a companion feature fusion network branch designed for fusing the multi-scale features computed from the main branch. The deep model is also made multi-modal by taking noisy user-provided tags as model input to complement the image input. For tackling the second issue, we introduce a label quantity prediction auxiliary task to the main label prediction task to explicitly estimate the optimal label number for a given image. Extensive experiments are carried out on two large-scale image annotation benchmark datasets, and the results show that our method significantly outperforms the state of the art.
引用
收藏
页码:1720 / 1731
页数:12
相关论文
共 50 条
  • [1] Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval
    Hua, Yan
    Yang, Yingyun
    Du, Jianhe
    [J]. ELECTRONICS, 2020, 9 (03)
  • [2] Robust Multi-Scale Multi-modal Image Registration
    Holtzman-Gazit, Michal
    Yavneh, Irad
    [J]. SIGNAL PROCESSING, SENSOR FUSION, AND TARGET RECOGNITION XIX, 2010, 7697
  • [3] Multi-Modal and Multi-Scale Oral Diadochokinesis Analysis using Deep Learning
    Wang, Yang Yang
    Gaol, Ke
    Hamad, Ali
    McCarthy, Brianna
    Kloepper, Ashley M.
    Lever, Teresa E.
    Bunyak, Filiz
    [J]. 2021 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR), 2021,
  • [4] Multi-Modal and Multi-Scale Oral Diadochokinesis Analysis using Deep Learning
    Department of Electrical Engineering and Computer Science, University of Missouri, Columbia
    MO, United States
    不详
    MO, United States
    [J]. Proc. Appl. Imagery Pattern. Recogn. Workshop, 2021,
  • [5] Efficient Large-Scale Multi-Modal Classification
    Kiela, Douwe
    Grave, Edouard
    Joulin, Armand
    Mikolov, Tomas
    [J]. THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 5198 - 5204
  • [6] Deep Multi-Scale Attention Hashing Network for Large-Scale Image Retrieval
    Feng, Hao
    Wang, Nian
    Tang, Jun
    [J]. Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2022, 50 (04): : 35 - 45
  • [7] Attention-Aided Generative Learning for Multi-Scale Multi-Modal Fundus Image Translation
    Pham, Van-Nguyen
    Le, Duc-Tai
    Bum, Junghyun
    Lee, Eun Jung
    Han, Jong Chul
    Choo, Hyunseung
    [J]. IEEE ACCESS, 2023, 11 : 51701 - 51711
  • [8] Large-scale Multi-modal Search and QA at Alibaba
    Jin, Rong
    [J]. PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 8 - 8
  • [9] MMpedia: A Large-Scale Multi-modal Knowledge Graph
    Wu, Yinan
    Wu, Xiaowei
    Li, Junwen
    Zhang, Yue
    Wang, Haofen
    Du, Wen
    He, Zhidong
    Liu, Jingping
    Ruan, Tong
    [J]. SEMANTIC WEB, ISWC 2023, PT II, 2023, 14266 : 18 - 37
  • [10] Multi-modal and multi-scale photo collection summarization
    Shen, Xu
    Tian, Xinmei
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2016, 75 (05) : 2527 - 2541