A Machine Learning-Aware Data Re-partitioning Framework for Spatial Datasets

被引:0
|
作者
Chowdhury, Kanchan [1 ]
Meduri, Venkata Vamsikrishna [1 ]
Sarwat, Mohamed [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85281 USA
基金
美国国家科学基金会;
关键词
Spatial Machine Learning; Spatial Data; Training; Time Reduction; Training Data Volume Reduction; REGIONALIZATION;
D O I
10.1109/ICDE53745.2022.00227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spatial datasets are used extensively to train machine learning (ML) models for applications such as spatial regression, classification, clustering, and deep learning. Most of the real-world spatial datasets are often too large, and many spatial ML algorithms represent the geographical region as a grid consisting of several spatial cells. If the granularity of the grid is too fine, that results in a large number of grid cells leading to long training time and high memory consumption issues during the model training. To alleviate this problem, we propose a machine learning-aware spatial data re-partitioning framework that substantially reduces the granularity of the spatial grid. Our spatial data re-partitioning approach combines fine-grained, adjacent spatial cells from a grid into coarser cells prior to training an ML model. During this re-partitioning phase, we keep the information loss within a user-defined threshold without significantly degrading the accuracy of the ML model. According to the empirical evaluation performed on several real-world datasets, the best results achieved by our spatial re-partitioning framework show that we can reduce the data volume and training time by up to 81%, while keeping the difference in prediction or classification error below 5% as compared to a model that is trained on the original input dataset, for most of the ML applications. Our re-partitioned framework also outperforms the state-of-the-art data reduction baselines by 2% to 20% w.r.t. prediction and classification errors.
引用
收藏
页码:2426 / 2439
页数:14
相关论文
共 50 条
  • [1] Energy-aware Register File Re-Partitioning for Clustered VLIW Architectures
    Zhao, Yingchao
    Xue, Chun Jason
    Li, Minming
    Hu, Bessie
    PROCEEDINGS OF THE ASP-DAC 2009: ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE 2009, 2009, : 805 - 810
  • [2] LEAF: A Federated Learning-Aware Privacy-Preserving Framework for Healthcare Ecosystem
    Patel, Nisarg P.
    Parekh, Raj
    Amin, Saad Ali
    Gupta, Rajesh
    Tanwar, Sudeep
    Kumar, Neeraj
    Iqbal, Rahat
    Sharma, Ravi
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (01): : 1129 - 1141
  • [3] A flexible deep learning-aware framework for travel time prediction considering traffic event
    Xu, Miao
    Liu, Hongfei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2021, 106 (106)
  • [4] A survey on datasets for fairness-aware machine learning
    Tai Le Quy
    Roy, Arjun
    Iosifidis, Vasileios
    Zhang, Wenbin
    Ntoutsi, Eirini
    WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 12 (03)
  • [5] RDPVR: Random Data Partitioning with Voting Rule for Machine Learning from Class-Imbalanced Datasets
    Hassanat, Ahmad B.
    Tarawneh, Ahmad S.
    Abed, Samer Subhi
    Altarawneh, Ghada Awad
    Alrashidi, Malek
    Alghamdi, Mansoor
    ELECTRONICS, 2022, 11 (02)
  • [6] Machine Learning of Spatial Data
    Nikparvar, Behnam
    Thill, Jean-Claude
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2021, 10 (09)
  • [7] Machine-learning based Blind Visual Quality Assessment with Content-aware Data Partitioning
    Gavrovska, Ana M.
    Zajic, Goran J.
    Milivojevic, Milan S.
    Reljin, Irini S.
    2018 14TH SYMPOSIUM ON NEURAL NETWORKS AND APPLICATIONS (NEUREL), 2018,
  • [8] DREAMER: a computational framework to evaluate readiness of datasets for machine learning
    Ahangaran, Meysam
    Zhu, Hanzhi
    Li, Ruihui
    Yin, Lingkai
    Jang, Joseph
    Chaudhry, Arnav P.
    Farrer, Lindsay A.
    Au, Rhoda
    Kolachalama, Vijaya B.
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2024, 24 (01)
  • [9] Data Anonymization for Privacy Aware Machine Learning
    Jaidan, David Nizar
    Carrere, Maxime
    Chemli, Zakaria
    Poisvert, Remi
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, 2019, 11943 : 725 - 737
  • [10] SAMA: Spatially-Aware Model-Agnostic Machine Learning Framework for Geophysical Data
    Yamani, Asma Z.
    Katterbaeur, Klemens
    Alshehri, Abdallah A.
    Al-Zaidy, Rabeah A.
    IEEE ACCESS, 2023, 11 : 7436 - 7449