A Machine Learning-Aware Data Re-partitioning Framework for Spatial Datasets

被引：0

作者：

Chowdhury, Kanchan ^{[1
]}

Meduri, Venkata Vamsikrishna ^{[1
]}

Sarwat, Mohamed ^{[1
]}

机构：

[1] Arizona State Univ, Tempe, AZ 85281 USA

来源：

2022 IEEE 38TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2022) | 2022年

基金：

美国国家科学基金会;

关键词：

Spatial Machine Learning; Spatial Data; Training; Time Reduction; Training Data Volume Reduction; REGIONALIZATION;

D O I：

10.1109/ICDE53745.2022.00227

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Spatial datasets are used extensively to train machine learning (ML) models for applications such as spatial regression, classification, clustering, and deep learning. Most of the real-world spatial datasets are often too large, and many spatial ML algorithms represent the geographical region as a grid consisting of several spatial cells. If the granularity of the grid is too fine, that results in a large number of grid cells leading to long training time and high memory consumption issues during the model training. To alleviate this problem, we propose a machine learning-aware spatial data re-partitioning framework that substantially reduces the granularity of the spatial grid. Our spatial data re-partitioning approach combines fine-grained, adjacent spatial cells from a grid into coarser cells prior to training an ML model. During this re-partitioning phase, we keep the information loss within a user-defined threshold without significantly degrading the accuracy of the ML model. According to the empirical evaluation performed on several real-world datasets, the best results achieved by our spatial re-partitioning framework show that we can reduce the data volume and training time by up to 81%, while keeping the difference in prediction or classification error below 5% as compared to a model that is trained on the original input dataset, for most of the ML applications. Our re-partitioned framework also outperforms the state-of-the-art data reduction baselines by 2% to 20% w.r.t. prediction and classification errors.

引用

页码：2426 / 2439

页数：14

共 50 条

[1] Energy-aware Register File Re-Partitioning for Clustered VLIW Architectures
Zhao, Yingchao
Xue, Chun Jason
Li, Minming
Hu, Bessie
PROCEEDINGS OF THE ASP-DAC 2009: ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 2009, 2009, : 805 - 810
[2] LEAF: A Federated Learning-Aware Privacy-Preserving Framework for Healthcare Ecosystem
Patel, Nisarg P.
Parekh, Raj
Amin, Saad Ali
Gupta, Rajesh
Tanwar, Sudeep
Kumar, Neeraj
Iqbal, Rahat
Sharma, Ravi
IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (01): : 1129 - 1141
[3] A flexible deep learning-aware framework for travel time prediction considering traffic event
Xu, Miao
Liu, Hongfei
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 106 (106)
[4] A survey on datasets for fairness-aware machine learning
Tai Le Quy
Roy, Arjun
Iosifidis, Vasileios
Zhang, Wenbin
Ntoutsi, Eirini
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 12 (03)
[5] RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets
Hassanat, Ahmad B.
Tarawneh, Ahmad S.
Abed, Samer Subhi
Altarawneh, Ghada Awad
Alrashidi, Malek
Alghamdi, Mansoor
ELECTRONICS, 2022, 11 (02)
[6] Machine Learning of Spatial Data
Nikparvar, Behnam
Thill, Jean-Claude
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (09)
[7] Machine-learning based Blind Visual Quality Assessment with Content-aware Data Partitioning
Gavrovska, Ana M.
Zajic, Goran J.
Milivojevic, Milan S.
Reljin, Irini S.
2018 14TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2018,
[8] DREAMER: a computational framework to evaluate readiness of datasets for machine learning
Ahangaran, Meysam
Zhu, Hanzhi
Li, Ruihui
Yin, Lingkai
Jang, Joseph
Chaudhry, Arnav P.
Farrer, Lindsay A.
Au, Rhoda
Kolachalama, Vijaya B.
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
[9] Data Anonymization for Privacy Aware Machine Learning
Jaidan, David Nizar
Carrere, Maxime
Chemli, Zakaria
Poisvert, Remi
MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, 2019, 11943 : 725 - 737
[10] SAMA: Spatially-Aware Model-Agnostic Machine Learning Framework for Geophysical Data
Yamani, Asma Z.
Katterbaeur, Klemens
Alshehri, Abdallah A.
Al-Zaidy, Rabeah A.
IEEE ACCESS, 2023, 11 : 7436 - 7449

← 1 2 3 4 5 →