Visual Thinking of Neural Networks: Interactive Text to Image Synthesis

被引：4

作者：

Lee, Hyunhee ^{[1
]}

Kim, Gyeongmin ^{[1
]}

Hur, Yuna ^{[1
]}

Lim, Heuiseok ^{[1
]}

机构：

[1] Korea Univ, Dept Comp Sci & Engn, Seoul 02841, South Korea

来源：

IEEE ACCESS | 2021年 / 9卷

关键词：

Cognition; Visualization; Neural networks; Generative adversarial networks; Image synthesis; Image registration; Text recognition; image generation; multimodal learning; multimodal representation; text-to-image synthesis; RECOGNITION MEMORY; PICTURE; WORDS;

D O I：

10.1109/ACCESS.2021.3074973

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reasoning, a trait of cognitive intelligence, is regarded as a crucial ability that distinguishes humans from other species. However, neural networks now pose a challenge to this human ability. Text-to-image synthesis is a class of vision and linguistics, wherein the goal is to learn multimodal representations between the image and text features. Hence, it requires a high-level reasoning ability that understands the relationships between objects in the given text and generates high-quality images based on the understanding. Text-to-image translation can be termed as the visual thinking of neural networks. In this study, our model infers the complicated relationships between objects in the given text and generates the final image by leveraging the previous history. We define diverse novel adversarial loss functions and finally demonstrate the best one that elevates the reasoning ability of the text-to-image synthesis. Remarkably, most of our models possess their own reasoning ability. Quantitative and qualitative comparisons with several methods demonstrate the superiority of our approach.

引用

页码：64510 / 64523

页数：14

共 50 条

[1] A survey and taxonomy of adversarial neural networks for text-to-image synthesis
Agnese, Jorge
Herrera, Jonathan
Tao, Haicheng
Zhu, Xingquan
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2020, 10 (04)
[2] Visual text reader for virtual image communication on networks
Yamada, A
Ohta, M
1997 IEEE FIRST WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, 1997, : 495 - 500
[3] Visual Thinking with an Interactive Diagram
Ware, Colin
Gilman, Anne T.
Bobrow, Robert J.
DIAGRAMMATIC REPRESENTATION AND INFERENCE, PROCEEDINGS, 2008, 5223 : 118 - +
[4] Neural networks in visual pattern image coding
Univ of Kosice, Kosice, Slovakia
Neural Network World, 1995, 5 (02): : 163 - 169
[5] Text image restoration using cellular neural networks
Stubberud, PA
Stubberud, AR
ISCAS '97 - PROCEEDINGS OF 1997 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I - IV: CIRCUITS AND SYSTEMS IN THE INFORMATION AGE, 1997, : 749 - 752
[6] Neural Visual Social Comment on Image-Text Content
Yin, Yue
Wu, Hanzhou
Zhang, Xinpeng
IETE TECHNICAL REVIEW, 2021, 38 (01) : 100 - 111
[7] CNN 101: Interactive Visual Learning for Convolutional Neural Networks
Wang, Zijie J.
Turko, Robert
Shaikh, Omar
Park, Haekyu
Das, Nilaksh
Hohman, Fred
Kahng, Minsuk
Chau, Duen Horng
CHI'20: EXTENDED ABSTRACTS OF THE 2020 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, 2020,
[8] Text/non-text image classification in the wild with convolutional neural networks
Bai, Xiang
Shi, Baoguang
Zhang, Chengquan
Cai, Xuan
Qi, Li
PATTERN RECOGNITION, 2017, 66 : 437 - 446
[9] Efficient Neural Architecture for Text-to-Image Synthesis
Souza, Douglas M.
Wehrmann, Jonatas
Ruiz, Duncan D.
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[10] TEXT TO IMAGE SYNTHESIS WITH ERUDITE GENERATIVE ADVERSARIAL NETWORKS
Zhang, Zhiqiang
Yu, Wenxin
Jiang, Ning
Zhou, Jinjia
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2438 - 2442

← 1 2 3 4 5 →