Visual Thinking of Neural Networks: Interactive Text to Image Synthesis

被引:4
|
作者
Lee, Hyunhee [1 ]
Kim, Gyeongmin [1 ]
Hur, Yuna [1 ]
Lim, Heuiseok [1 ]
机构
[1] Korea Univ, Dept Comp Sci & Engn, Seoul 02841, South Korea
来源
IEEE ACCESS | 2021年 / 9卷
关键词
Cognition; Visualization; Neural networks; Generative adversarial networks; Image synthesis; Image registration; Text recognition; image generation; multimodal learning; multimodal representation; text-to-image synthesis; RECOGNITION MEMORY; PICTURE; WORDS;
D O I
10.1109/ACCESS.2021.3074973
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reasoning, a trait of cognitive intelligence, is regarded as a crucial ability that distinguishes humans from other species. However, neural networks now pose a challenge to this human ability. Text-to-image synthesis is a class of vision and linguistics, wherein the goal is to learn multimodal representations between the image and text features. Hence, it requires a high-level reasoning ability that understands the relationships between objects in the given text and generates high-quality images based on the understanding. Text-to-image translation can be termed as the visual thinking of neural networks. In this study, our model infers the complicated relationships between objects in the given text and generates the final image by leveraging the previous history. We define diverse novel adversarial loss functions and finally demonstrate the best one that elevates the reasoning ability of the text-to-image synthesis. Remarkably, most of our models possess their own reasoning ability. Quantitative and qualitative comparisons with several methods demonstrate the superiority of our approach.
引用
收藏
页码:64510 / 64523
页数:14
相关论文
共 50 条
  • [1] A survey and taxonomy of adversarial neural networks for text-to-image synthesis
    Agnese, Jorge
    Herrera, Jonathan
    Tao, Haicheng
    Zhu, Xingquan
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (04)
  • [2] Visual text reader for virtual image communication on networks
    Yamada, A
    Ohta, M
    1997 IEEE FIRST WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1997, : 495 - 500
  • [3] Visual Thinking with an Interactive Diagram
    Ware, Colin
    Gilman, Anne T.
    Bobrow, Robert J.
    DIAGRAMMATIC REPRESENTATION AND INFERENCE, PROCEEDINGS, 2008, 5223 : 118 - +
  • [4] Neural networks in visual pattern image coding
    Univ of Kosice, Kosice, Slovakia
    Neural Network World, 1995, 5 (02): : 163 - 169
  • [5] Text image restoration using cellular neural networks
    Stubberud, PA
    Stubberud, AR
    ISCAS '97 - PROCEEDINGS OF 1997 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I - IV: CIRCUITS AND SYSTEMS IN THE INFORMATION AGE, 1997, : 749 - 752
  • [6] Neural Visual Social Comment on Image-Text Content
    Yin, Yue
    Wu, Hanzhou
    Zhang, Xinpeng
    IETE TECHNICAL REVIEW, 2021, 38 (01) : 100 - 111
  • [7] CNN 101: Interactive Visual Learning for Convolutional Neural Networks
    Wang, Zijie J.
    Turko, Robert
    Shaikh, Omar
    Park, Haekyu
    Das, Nilaksh
    Hohman, Fred
    Kahng, Minsuk
    Chau, Duen Horng
    CHI'20: EXTENDED ABSTRACTS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2020,
  • [8] Text/non-text image classification in the wild with convolutional neural networks
    Bai, Xiang
    Shi, Baoguang
    Zhang, Chengquan
    Cai, Xuan
    Qi, Li
    PATTERN RECOGNITION, 2017, 66 : 437 - 446
  • [9] Efficient Neural Architecture for Text-to-Image Synthesis
    Souza, Douglas M.
    Wehrmann, Jonatas
    Ruiz, Duncan D.
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [10] TEXT TO IMAGE SYNTHESIS WITH ERUDITE GENERATIVE ADVERSARIAL NETWORKS
    Zhang, Zhiqiang
    Yu, Wenxin
    Jiang, Ning
    Zhou, Jinjia
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2438 - 2442