Generative Adversarial Networks for Increasing the Veracity of Big Data

被引：0

作者：

Dering, Matthew L. ^{[1
]}

Tucker, Conrad S. ^{[2
]}

机构：

[1] Penn State Univ, Comp Sci & Engn, University Pk, PA 16802 USA

[2] Penn State Univ, Engn Design & Ind Engn, University Pk, PA 16802 USA

来源：

2017 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) | 2017年

基金：

美国国家科学基金会;

关键词：

Generative Models; Big Data; Deep Learning; GANs; Sketches;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work describes how automated data generation integrates in a big data pipeline. A lack of veracity in big data can cause models that are inaccurate, or biased by trends in the training data. This can lead to issues as a pipeline matures that are difficult to overcome. This work describes the use of a Generative Adversarial Network to generate sketch data, such as those that might be used in a human verification task. These generated sketches are verified as recognizable using a crowd-sourcing methodology, and finds that the generated sketches were correctly recognized 43.8% of the time, in contrast to human drawn sketches which were 87.7% accurate. This method is scalable and can be used to generate realistic data in many domains and bootstrap a dataset used for training a model prior to deployment.

引用

页码：2595 / 2602

页数：8

共 50 条

[1] Generative adversarial networks for imputing missing data for big data clinical research
Weinan Dong
Daniel Yee Tak Fong
Jin-sun Yoon
Eric Yuk Fai Wan
Laura Elizabeth Bedford
Eric Ho Man Tang
Cindy Lo Kuen Lam
[J]. BMC Medical Research Methodology, 21
[2] Generative adversarial networks for imputing missing data for big data clinical research
Dong, Weinan
Fong, Daniel Yee Tak
Yoon, Jin-sun
Wan, Eric Yuk Fai
Bedford, Laura Elizabeth
Tang, Eric Ho Man
Lam, Cindy Lo Kuen
[J]. BMC MEDICAL RESEARCH METHODOLOGY, 2021, 21 (01)
[3] On the Variety and Veracity of Cyber Intrusion Alerts Synthesized by Generative Adversarial Networks
Sweet, Christopher
Moskal, Stephen
Yang, Shanchieh Jay
[J]. ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2020, 11 (04)
[4] Research on Denoising Technology of Generative Adversarial Networks (GAN) Based on Big Data
Feng Xiancheng
Zhang Xinyu
Qiu Wu
[J]. MIPPR 2019: PARALLEL PROCESSING OF IMAGES AND OPTIMIZATION TECHNIQUES; AND MEDICAL IMAGING, 2020, 11431
[5] Conditional Generative Adversarial Networks with Adversarial Attack and Defense for Generative Data Augmentation
Baek, Francis
Kim, Daeho
Park, Somin
Kim, Hyoungkwan
Lee, SangHyun
[J]. JOURNAL OF COMPUTING IN CIVIL ENGINEERING, 2022, 36 (03)
[6] Generative Adversarial Networks for Bitcoin Data Augmentation
Zola, Francesco
Lukas Bruse, Jan
Etxeberria Barrio, Xabier
Galar, Mikel
Orduna Urrutia, Raul
[J]. 2020 2ND CONFERENCE ON BLOCKCHAIN RESEARCH & APPLICATIONS FOR INNOVATIVE NETWORKS AND SERVICES (BRAINS), 2020, : 136 - 143
[7] Data Synthesis based on Generative Adversarial Networks
Park, Noseong
Mohammadi, Mahmoud
Gorde, Kshitij
Jajodia, Sushil
Park, Hongkyu
Kim, Youngmin
[J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2018, 11 (10): : 1071 - 1083
[8] Data Augmentation with Improved Generative Adversarial Networks
Shi, Hongjiang
Wang, Lu
Ding, Guangtai
Yang, Fenglei
Li, Xiaoqiang
[J]. 2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2018, : 73 - 78
[9] Augmenting data with generative adversarial networks: An overview
Ljubic, Hrvoje
Martinovic, Goran
Volaric, Tomislav
[J]. INTELLIGENT DATA ANALYSIS, 2022, 26 (02) : 361 - 378
[10] Training Generative Adversarial Networks with Limited Data
Karras, Tero
Aittala, Miika
Hellsten, Janne
Laine, Samuli
Lehtinen, Jaakko
Aila, Timo
[J]. ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33

← 1 2 3 4 5 →